Fine-tuning sometimes fails

rob19 · February 7, 2024, 10:08pm

I have been successfully using fine-tuning for many months to solve a complex multi-class classification task, where I tune the LLM to output a classification code for a text input. (I was initially on Curie and migrated to GPT-3.5-Turbo after Curie was deprecated.)

I carefully measure the accuracy of the resulting model, both against the training data and a hold-out validation set.

Although I have successfully fine-tuned many models, I have had two periods of time when there have been repeated problems in the fine-tuned model’s behavior despite using the same hyperparameters and largely similar training data. When this happens, the accuracy of its classification results on the training data drops from around 98% to 80%, and on the validation data from 91% to 75%.

My question: Although this could somehow be my fault (grateful for any tips!), I’m wondering if OpenAI makes changes to the fine-tuning engine behind the API, which might explain these problems? If so, are such changes announced and documented anywhere? Consistent fine-tuning results are critical for my application.

Thanks!
–Rob

Topic		Replies	Views
Has fine-tuning gotten worse since last year? Bugs gpt-35-turbo , api	0	436	April 9, 2024
Fine-tuned model creates always the same token API fine-tuning , fine-tuning-problems	0	442	March 7, 2024
Fine tuning is behaving randomly API fine-tuning , epoch , batch-size , learning-rate-multip	0	331	April 23, 2024
Does gpt-3.5 or gpt-4 API get minor updates? API gpt-4 , gpt-35-turbo	8	2205	December 27, 2023
Fine-Tuning finish before the full training Bugs api	5	79	November 26, 2024

Fine-tuning sometimes fails

Related topics