Quality VS Quantity For Finetuning

ben24 · March 29, 2024, 4:33pm

I am in the midst of training a model, and am wondering what the balance of quality and quantity is in terms of impact on finetuning performance.

It takes a long time to gather high quality finetuning examples, which puts a bit of a cap on quantity if you want all your examples to be incredible.

However, it is easy to gather a lot of mediocre data.

Which would you say is more important?

jr.2509 · March 29, 2024, 4:47pm

To quote from the latest OpenAI fine-tuning guidance

In general, if you have to make a trade-off, a smaller amount of high-quality data is generally more effective than a larger amount of low-quality data

They’ve recently expanded the guidance and articulated some more detailed considerations for data quality and quantity that you may find helpful for your use case.

Topic		Replies	Views
Adjusting Finetuning Hyper-parameters For Small Datasets API fine-tuning , api	0	145	April 6, 2024
Does Finetuning data need to be perfect? API api	3	367	January 25, 2024
Why does fine tuning worsens the results? API	1	742	February 6, 2024
Multiple fine tunings vs single fine tuning API fine-tuning	0	328	January 10, 2024
Fine tuning lower cost models API	3	526	December 23, 2023

Quality VS Quantity For Finetuning

Related Topics