Help with fine-tuning, think I'm over-fitting, but not sure

edmund · September 26, 2023, 3:02am

I’m fine-tuning a gpt-3.5-turbo model and need some help understanding the metrics being spat out.

I’m getting an output curve that looks like this:

I THINK this means I’m… massively… overfitting my fine-tuned models, but am not sure, so that’s my first question… is that what I’m doing?

Second question, assuming #1 is “yes”, is what should I be changing here? I’m running 5 epochs because the outputs I want are very deterministic, just a true/false boolean. Is that too many epochs?

I have a pretty large training set, ~900 examples, with a validation set of ~100 - should I be reducing the number of samples in my training and validation sets?

The data says I’m running “1500 steps”, and from the looks of the chart, I probably want to be stopping at around 400 as that’s where the training and validation loss numbers converge close to zero, but I have no idea what actually controls the number of steps the model runs because it doesn’t seem to be a multiplier of either my example count OR the Epoch count…

anon22939549 · September 26, 2023, 4:52am

It’s tough to say what the right course of action is without knowing a little bit about your data.

Part of the issue might be that the training data you are using is too homogeneous, but it’s tough to tell without some understanding of what that data looks like.

Edit: I think I glossed over the part where you described the desired output as simply true/false.

This is almost certainly contributing to your perceived over-fitting issue.

Basically, you have 900 inputs you are performing a binary clarification on. If there is a very clean break in these example inputs, then this might not exactly be over-fitting.

Here’s a toy example:

Say I wanted to make a classifier which could determine if the first letter of an input is a vowel. There’s no “messiness” there. We would expect that with enough training examples the training loss would be 0 and the validation loss would be 0. That’s not over-fitting—it’s just “fitting”—because it is possible to make a perfect classifier.

So, in your case it’s still tough to say without knowing what you’re classifying.

My guess is that it’s not really an issue of too many examples or too many epochs, but rather you don’t have enough really hard examples with nearly identical inputs and different classes.

Foxalabs · September 26, 2023, 3:19pm

Another regular forum user found the exact same thing @curt.kennedy for visability.

curt.kennedy · September 26, 2023, 5:37pm

Yeah @edmund, I saw the same weird TL curve when fine-tuning a binary classifier on this thread over here:

My training file had 4000 examples, and the system decided to choose 3 epochs for this amount of data. So with only 3 epochs, I don’t feel I was overfitting, and all examples were totally different tokens going in (no repeats).

When I get some time, I was going to monitor this model with the old Babbage, and see if there are any discrepancies, or degradation in model performance (since the old model was 4 epochs, and used the same training data).

But initial spot-checks show the new “overfit” model is performing correctly. Just need more data to be confident.

But the TL curve going to 0 is disturbing!

Gadcuit · September 26, 2023, 6:27pm

I tried copying all the text and graphs you posted to Chat GPT 3.5-Turbo for help. The answer received from generate first, regenerate second and regenerate third, as I have read it is so very useful(can’t put it here as it is very long). These analyzes may help you decide. Have a nice day.

curt.kennedy · September 26, 2023, 6:30pm

What does ChatGPT think about training loss being 0.0000? @Gadcuit

I can’t imagine it would think it’s a good thing.

Gadcuit · September 26, 2023, 8:03pm

… I just thought that training loss=0.000 I didn’t dare to think about it anymore, I didn’t want to think that the validation loss would be equal to 0.000 either.

Topic		Replies	Views
Training loss=good, Validation loss=good API fine-tuning , api , fine-tuning-problems	8	5449	April 5, 2024
Finetuning Noob : Guidelines and Best Practices? API chatgpt , fine-tuning	1	2683	September 30, 2023
Questions about fine-tuning GPT-3.5-turbo API fine-tuning	1	2182	October 29, 2023
Poor fine-tuning results of GPT 3.5 API	3	1280	February 21, 2024
Avoid overfitting during the fine-tuning of gpt-3.5 turbo API gpt-35-turbo , fine-tuning , fine-tuning-problems	4	3120	December 21, 2023

Help with fine-tuning, think I'm over-fitting, but not sure

Related topics