Specific Meanings of Training Loss, Validation Loss, and Full Validation Loss?

hammergpt · April 20, 2024, 9:38am

I am currently trying to fine-tune the GPT-3.5 model, and following the Guide, I have completed the fine-tuning task.

I have a question regarding the fine-tuning metrics: Training loss 1.3896, Validation loss 2.0533, Full validation loss 1.6265.

What do these three metrics specifically represent?

I understand that a lower Training loss indicates better fitting, and a Validation loss greater than Training loss suggests overfitting.
But are there any standards or thresholds for these metrics? I couldn’t find related content in the Guide and would like to ask if anyone knows the specific uses of these three metrics?

wonjae · April 23, 2024, 4:17am

@hammergpt

In machine learning, the goal is to build a parsimonious model that also minimizes overfit. Hence, for any training, there is a portion of data that is not used for the training and used for validation instead by fitting it on the trained model. This is the validation set. The training set is the set that was used to train the model. Then training loss is the in sample minimized loss function values and validation loss is the out of sample minimized loss function values. These are both for each epoch. The full validation loss is the loss metric for the validation loss across all epochs I hope this clarifies your question.

hammergpt · April 25, 2024, 6:57am

Thank you for the reply. However, my question mainly focuses on:

Are there specific data ranges for training loss and validation loss, such as [0, 10.00000]?
Are there standard ranges for these two metrics to assess model usability? For example, a model is considered usable if the training loss is less than 1.0000 and the validation loss is less than the training loss, similar to this kind of interpretive guidance.

torronen · August 18, 2024, 12:41pm

I was looking for similar benchmarks, but I think it could be a metric based on a specific validation set. Hopefully someone could clarify if it has been normalized. It looks like it could be since it often starts around 1.

Anyway, until someone clarifies, I think the best way is to see if it is reducing and convergences during the finetuning. If it stays close to initial value, then finetuning did not “do anything useful” and we are better of using the base model. On the other hand, if it does not converge then we might benefit from finetuning longer.

Is it usable? If it is better than base model and base model already was somehow usable, then it is probably is. How much better should it be to be woth it? I think this depends on the use case and we can only test in real life (or use another measurement tools).

Just my speculation without knowledge of how OpenAI exactly calculates the loss. Maybe there is now someone with more knowledge and will help us understand better soon.

Topic		Replies	Views
Loss Function in Fine Tuning API gpt-35-turbo , fine-tuning , fine-tuning-problems	3	7429	September 29, 2023
Training loss=good, Validation loss=good API fine-tuning , api , fine-tuning-problems	8	4785	April 5, 2024
Finetuning Noob : Guidelines and Best Practices? API chatgpt , fine-tuning	1	2612	September 30, 2023
Validation loss vs. full validation loss API api	2	1566	April 5, 2024
Questions about fine-tuning GPT-3.5-turbo API fine-tuning	1	2131	October 29, 2023

Specific Meanings of Training Loss, Validation Loss, and Full Validation Loss?

Related topics