Does Finetuning data need to be perfect?

ben24 · January 24, 2024, 8:08pm

When finetuning a model with a bunch of exemplar messages, do all the messages need to be absolutely perfect, or will the occasional slip in format etc be overlooked by the model as long as they don’t appear too often?

WolfgangB · January 24, 2024, 8:32pm

Currently working with this problem too, I think my strategy is going to be to find a hundred or so GOOD examples, and then see how the finetuned model does on a small sampleset. My idea will be to just iterate until I get something satisfactory…

I’m not sure what your doing with your data/model post finetuning but a general rule of thumb is the average of the data will determine the best performance of your finetuned model

vb · January 24, 2024, 8:56pm

What does the “occasional” or “too often” mean in this context? What if the model tries to identify patterns where future slip ups are expected?

In general: high quality data leads to high quality results. It has also been shown that you can reduce the sample size if the data quality is high enough.
Expand the sample size and you may be able to make up for some of the errors but the result will never be optimal.
If you are ok with “ok” results, then go for it.

You can try the following:
Start a new fine-tuning job with perfect data and a relatively small data sample.
Then evaluate the results for benchmarking.

Proceed to train the fine-tuned model with a mixed set of good and contaminated data, evaluate again and compare to your benchmark.

What do you think the conclusion will be?

ben24 · January 25, 2024, 10:08am

Yeh that’s kinda along my line of thought. Another thing I was wondering was whether finetuned models would be able to adapt to new functions being added to their tools further down the line?

Topic		Replies	Views
Finetuning For An Assistant API api	3	1314	January 19, 2024
Adaptability of Fine-tuned Models API api	0	435	January 23, 2024
Fine tuning - how exactly does it work? API	6	2493	December 23, 2023
Continuous fine-tuning - Best Practices? API	5	4335	November 22, 2024
Fine-tuning a fine-tuned model does not mean extanding the previous dataset (prompt and complexion examples) API	10	2525	December 23, 2023

Does Finetuning data need to be perfect?

Related topics