Training loss=good, Validation loss=good

ben24 · February 19, 2024, 8:32pm

I recently fine-tuned a model using the default 3 epochs.

The training loss was: 0.29
The validation loss was: 0.4

As far as I’m aware, this is quite good.

However, the results I’m getting from the model are just not very good. At all. It is as if I never finetuned it at all.

Should I try again with perhaps more epochs?

Bear in mind, it’s not an enormous data set I’m using: 243 examples in the training dataset, and 64 in the validation dataset. However, I could increase the size of that if you guys think that is the problem?

Any guidance would be thoroughly appreciated👍

trenton.dambrowitz · February 19, 2024, 8:51pm

The most important thing is the data you’re fine-tuning it on.

Are you willing to share some more details about your use-case and possibly a few examples from your dataset?

We’d be able to provide much better suggestions with that extra context.

ben24 · February 19, 2024, 9:28pm

Sure, I’m finetuning an AI tutor which can perform a range of different function calls (i.e. the make lesson content, lesson plans, course structures, mark questions) etc.

I have a bunch of function call examples for each of them, with a system prompt (always the same), user prompt, and the function_call name and arguments, along with the tools property indicating the functions that the model can perform at that time.

E.g:

{
“messages”: [
system,
{
“role”: “user”,
“content”: “Please make the units for a module of my course on Cell Biology. The module topic is Applications Of Cell Biology. The module entry knowledge is: Knowledge from previous modules., and the module end knowledge is: Exploring real-world applications of cell biology in areas such as medicine, biotechnology, and research…”
},
{
“function_call”: {
“name”: “makeUnits”,
“arguments”: “{[array of 5 units, each with two arrays containing 2-3 items each]}”
},
“role”: “assistant”
}
]
}

If you want any more detail I could give you an email?

HomeRank · February 27, 2024, 1:44pm

I just fine tuned using 550,000 tokens and its like I broke the model. Literally- it can’t answer any questions correctly- and doesn’t even know its persona even though all chat completions and data sets start with:
{“messages”: [{“role”: “system”, “content”: "Your persona is ‘Val’. Val serves as the AI representation of HomeRank, a cutting-edge Home Valuation system…

ben24 · March 1, 2024, 4:48pm

Finetuning can be weird. I’m sure the system should be able to cope with 500k tokens given that on the pricing page it expresses prices as per 1 Million tokens, suggesting the problem lies elsewhere. Exactly where…I don’t know. I’ve had pretty mixed results with finetuning. Sometimes the models just do weird things I’ve just got access to finetuning for gpt-4, and gpt 3.5-0125, which may yield better results.

Diet · March 1, 2024, 11:57pm

Is it possible that you guys are expecting the wrong thing from fine-tunes?

did you read this? https://platform.openai.com/docs/guides/fine-tuning/when-to-use-fine-tuning

It’s technically a way to optimize multi-shot prompts. If you can’t achieve the same results with multi-shot prompting, it’s unlikely that you’ll get good results with fine-tuning

Vinny.Delgado · April 5, 2024, 1:28am

The one thing i’ve dealt with since the latest upgrades to 0125 is over-fitting, previously this wasn’t an issue on my data-sets but now i’ve been adjusting the hyper-parameters when training and it helps a lot.

ben24 · April 5, 2024, 8:13am

I’ve kind of experienced that as well.

Here is the plotted training and validation loss for a recent job I did using 0125:

I’m no expert by any means, but here’s my interpretation of what happened (please correct me if I’m talking nonsense):

The training loss steadily decreased, and so did the validation loss. This is good. But then towards the end, the validation loss started to increase. This means the model stopped learning the general problem, and started overfitting.

Maybe the 0125 model learns quicker than previous models, meaning it needs less epochs?? I really don’t know.

cubanzemulax · April 5, 2024, 10:08am

Well, “good” in train and val loss is not absolute.
Good val and train loss depends on the deviation from the initial loss as well as the factor influencing the decrease.
so just because its 0. something doesn’t mean its good. Your 0.29 could actually be 0.299999; which is a large number compared to if you had a loss like 0.00009 right?.
You need to iteratively train the model until you start to see promising results. so more training please!
There are other factors that could also be considered but this could be a good starting point

Topic		Replies	Views
Help with fine-tuning, think I'm over-fitting, but not sure API fine-tuning	7	2454	December 21, 2023
Questions about fine-tuning GPT-3.5-turbo API fine-tuning	1	2170	October 29, 2023
Finetuning Noob : Guidelines and Best Practices? API chatgpt , fine-tuning	1	2668	September 30, 2023
Poor fine-tuning results of GPT 3.5 API	3	1145	February 21, 2024
Why are "Training loss" and "Validation loss" so high API gpt-35-turbo , fine-tuning , fine-tuning-problems	7	800	June 20, 2024

Training loss=good, Validation loss=good

Related topics