When should I stop training of my fine-tuned model?

jackmd.work · December 21, 2023, 3:35pm

Hello, I’m fine-tuning a gpt-3.5 model to format descriptional text into an objective scoring system.
My current work flow is:

Using the fine-tuned model to process 50 individual text.
Mannually correcting and label those results, using them to further train the previous model (40 for training and 10 for validation. All prompts the previous model labeled wrongly were asigned to training group. The testing prompts from the previous model’s validation file were also added to the training file of this model, making 50 training and 10 validation prompts total).
Go back to 1 and do it all over again.

It has served me well, with significantly improved accuracy and consistency after every round. However, some errors seems to persist every now and then. Eg. Some part of the text may contain sentences like ‘1/51 abnormalty points was found’ or ‘1/16 abnormalty points in part A and 0/6 abnormalty points in part B was found’. I want the model to sum up total positive abnormalty points number and classify them (0: 0, 1-2: 1, 3-7: 2, 8-15: 3, >15: 4, etc.) In most cases, the model would give a correct answer. However, sometimes it wrongfully classifies a grade lesser or more than expected, even when structure of the given text is not very different from another one labeled correctly.
Repeated training does help to reduce the incidence of such mistake, but doesn’t seem to prevent them from happening, even after 5 rounds of training. (The example provided seems to bother the model the most, maybe because other factors of the scoring system are mainly binary choices or copy-and-paste question.) At this point, I wonder if it will be of much improvement to further train my model, especially with the latest result of my training:

step	train loss	train accuracy	valid loss	valid mean token accuracy
1	0.02551	0.98361	0	0.91379
…
141	0	1	0	0.91379

I am no expert in math or computer science, but it seems the valid mean token accuracy didn’t improve after the training. Should I stop training the model? If so, is there anything I can do to further reduce such mistakes? If not, what’s the signal of a model being fully trained? Is there some way the api can return its confidence about the response? So I can mannually check the suspecious ones?
Thank you all for your generous help and suggestions!

jackmd.work · December 22, 2023, 9:36am

From what I’ve learnt in the forum, the less data provided for training, the less performance the fine tuned data will acquire. So maybe if I combined all datas (200 of them) and train a fresh model instead of training with 50 data for each round for 4 times, I may have a better performed model?

Topic		Replies	Views
Training loss=good, Validation loss=good API fine-tuning , api , fine-tuning-problems	8	4852	April 5, 2024
Questions about fine-tuning GPT-3.5-turbo API fine-tuning	1	2142	October 29, 2023
Poor fine-tuning results of GPT 3.5 API	3	1122	February 21, 2024
Should I keep finetuning a finetuned model with new data, or use it to finetune new model? API fine-tuning , api	4	1234	March 5, 2024
Continuous fine-tuning - Best Practices? API	5	4596	November 22, 2024

When should I stop training of my fine-tuned model?

Related topics