Loss Function in Fine Tuning

fl.brienza · September 29, 2023, 8:28am

Hello everyone,
I have been trying fine tuning GPT-3.5-Turbo for a while, but I cannot find any documentation about how the loss function is computed between training and validation set.
I mean, like RMSE in regression problems I know that the closer to 0 the better and I think here is the same. But what is the formula? How can it be computed between sentences?
Many thanks in advance.

_j · September 29, 2023, 11:07am

Legacy fine tune offers for the validations:

If you provided a validation file, we periodically calculate metrics on batches of validation data during training time. You will see the following additional metrics in your results file:

validation_loss: loss on the validation batch

validation_sequence_accuracy: the percentage of completions in the validation batch for which the model’s predicted tokens matched the true completion tokens exactly. For example, with a batch_size of 3, if your data contains the completion [[1, 2], [0, 5], [4, 2]] and the model predicted [[1, 1], [0, 5], [4, 2]], this accuracy will be 2/3 = 0.67

validation_token_accuracy: the percentage of tokens in the validation batch that were correctly predicted by the model. For example, with a batch_size of 3, if your data contains the completion [[1, 2], [0, 5], [4, 2]] and the model predicted [[1, 1], [0, 5], [4, 2]], this accuracy will be 5/6 = 0.83

For loss function, the number decreases each batch. If you want to dig deeper, it seems that GPT2 <= pytorch <= BERT, with source: https://github.com/huggingface/transformers/blob/v4.33.3/src/transformers/models/bert/modeling_bert.py#L771

What still remains of any of this in new fine tune is speculative, given how much has been taken away from hyperparameters and from weights and biases compatible results.

fl.brienza · September 29, 2023, 12:37pm

Many thanks. For example in this case, how can I understand how it is computed?

_j · September 29, 2023, 1:30pm

We can only take guesses. There’s simply no “here’s how not to waste your money” guide.

Here, I suppose that the best performance is at lowest validation loss, and where there is initial convergence on a best training loss.

After that point, the fine-tune becomes over-specialized on the input and doesn’t infer well the alternate cases of your validation held-out group.

So I would guess the best general performance on the TYPE of questions is at half the epoch you ran (although they alter other hyperparameters on you also if the training file size changes).

After analysis, better would be then to use the investment of preparing your 20% held-out by putting them back into a final unvalidated fine-tune, giving the most varied training. Then you can test performance on unanticipated human inputs.

Topic		Replies	Views
Specific Meanings of Training Loss, Validation Loss, and Full Validation Loss? API fine-tuning , fine-tuning-problems	3	3057	August 18, 2024
Questions about fine-tuning GPT-3.5-turbo API fine-tuning	1	2141	October 29, 2023
Fine Tuning Metrics Definition Documentation fine-tuning-problems	0	81	January 27, 2025
Training loss=good, Validation loss=good API fine-tuning , api , fine-tuning-problems	8	4838	April 5, 2024
Finetuning Noob : Guidelines and Best Practices? API chatgpt , fine-tuning	1	2626	September 30, 2023

Loss Function in Fine Tuning

Related topics