Can using similar examples in the training dataset for fine-tuning increase its accuracy or is it unnecessary and we should only have diverse examples.
Hi!
As a general rule of thumb I would say that your training data set should be reflective of the data you will apply the fine-tuned model on. If that data is expected to be very diverse, then yes you should aim to have a higher diversity of example data as well and vice versa. If there is too much of a mismatch between your training and actual data, then you risk running into overfitting issues.
That said, if you have a larger training data set it is perfectly fine if a few examples are similar as long as the overall composition is still in check.
Well the point of fine-tuning is to provide examples for inputs → outputs. Essentially, you have inputs, and you want a desired output, right? So the answer to your question depends on what you mean by “diverse” as it is a broad word in this context.
Assuming you already know what your desired outputs are, then the best thing you can do is provide input examples which are highly relevant / substantially related to the inputs which you’ll be using the fine-tuned model for.
For me this means: I have content which I want to create. My desire is to achieve a specific writing style. So, all of my inputs examples are very similar in subject matter, with the main differences being “subtopic” subject matter, and with very minor differences in syntax, word choice, tone, etc. (I don’t want the writing to be robotic, so I provide “some” diversity there, but materially, it is still very similar, and you would think it was all written by the same person).
These minor differences in style are, to me, acceptable given my concept of a “desired output”. For example, if I wanted a HIGHLY specific style, then my approach would not include any input examples which do not achieve that exact style I want. No exceptions, because what would an exception offer besides irrelevant information to the fine-tuning?
You should try to be as exacting as you can be. Know exactly what you want, match the output examples to that, and then match the input examples to expected inputs you’ll be working with. Maybe that is all sort of rudimentary and obvious, but sometimes it helps to consider the basics.
Thanks @jr.2509. We currently have 10 to 15 examples in our training data. So do you thing adding few more example which are similar to ones that are already there be helpful or unnecessary.
You 100% need more examples. OpenAI defines a “low amount of examples” as under 100. Personally I started off at 50 (the recommended minimum) and decided it probably wasn’t enough.
Creating examples can be very time consuming. But it’s recommended to have as many as you’re comfortable paying for.
Example count recommendations
(https://platform.openai.com/docs/guides/fine-tuning/example-count-recommendations)
To fine-tune a model, you are required to provide at least 10 examples. We typically see clear improvements from fine-tuning on 50 to 100 training examples with
gpt-3.5-turbo
but the right number varies greatly based on the exact use case.
We recommend starting with 50 well-crafted demonstrations and seeing if the model shows signs of improvement after fine-tuning. In some cases that may be sufficient, but even if the model is not yet production quality, clear improvements are a good sign that providing more data will continue to improve the model. No improvement suggests that you may need to rethink how to set up the task for the model or restructure the data before scaling beyond a limited example set.
Just to add in addition to what @ClipFarms has already pointed out regarding the number of examples.
It’s not quite possible to give you specific guidance on the type of examples you should include without knowing specifics about your use case including your specific prompt including any input and the desired output data .
If you can share a few more details, myself or other members of this Forum can share some additional considerations for the composition of the training data set. Otherwise, it’s just a guessing game and there’s a high risk to provide you with wrong guidance.