What kind of data is required for open ai?

I have created a chatbot and I’m uploading data to it with the help of a CSV file. The main question that came to mind while searching for fine-tuning is the level of data, which is currently on a weekly basis. Is this data sufficient to train the model and obtain accurate answers, or should I consider changing the granularity of the data?

It really depends on what you mean by “uploading data to it”.

The models only support sending input and getting back output.

You would not want to fine-tune a model on continually-changing data. The fine-tuning process is more appropriate for changing output behavior for types of input.

I want to to know will be ok if i upload week, or month level of data does that have impact on model output

Fine-tuning an AI model, at considerable time and needing performance evaluation, does not make it a data retrieval system.

If you are looking for data augmentation to improve the answering capability on your own documents, you’d likely want to investigate an embeddings-based vector database.

An AI model cannot answer about and understand a large amount of data at once. You only can provide it what will fit in the context window length, such as you might be doing now to provide past chat.

1 Like

I mean isn’t it obvious, that such kind of a job would be enormous work? Why do you think GPT-3.5 and 4 don’t have that data.