Steps Meaning in Fine Tuning and How to Pick the Optimal Number of Epochs from Them?

Hello everyone,
I wanted to add a validation/test set to my training set to see its performance.
By the way, from the image I am missing two points:

  • What does “Step” mean? Can I customize them?
  • I trained the model with 15 epochs, how can I pick the best number from this image?
    So far, I have noticed that I can customize epochs only in the hyperparameters section.
    In addition, when I get the best epoch, let’s say 8, do I have to re-train the model with 8 epochs as hyperparameter?
    Many thanks for your reply.
1 Like

Steps are somewhat equivalent to batches, points where a validation was made.

You are correct, there is no option exposed for anything except epochs.

Past useful information has been wiped from the OpenAI website. Here’s a GPT-3 guide that showed how one might have used a full report of steps with weights and biases, and the prior learning parameters available with models.

Epochs is the number of learning runs that are repeated on the same data set. If you continued a fine-tune 8-epoch model with the same training file for 2 epochs, then the result should be similar to 10-epoch single job (although this also has been reported to diverge from expectations).

Thanks, but in my case that I have trained my model with 15 epochs, where can I find the best number of them?
Moreover, if for example in my case the best number of epochs is 8, do I have to repeat the process, but with 8 epochs?

I believe you do, yes. I can’t find a way to extract checkpoints from earlier epochs.

In general, I find that, for fine tuning, fewer epochs generally are better, because otherwise I end up with overfitting.

1 Like

In my case for example I only had 10 sentences and I noticed that with 30 epochs results significantly improved.
By the way, do you know a way to find the optimal number of epochs?
Many thanks

15 epochs, divided by 141 total steps? You don’t have precise epoch reports there.

Now that fine-tune continuation has been enabled to make a new model based on an existing model fine tune, you have a case where you can train 8 epochs, and then 8 epochs more, and get two models for the price of one.

Where training loss is converging on zero is a place where you will only make the model over-specialized, where inferring other types of questions between examples gets harder. Number of examples of “best fine-tune practices using validations” from OpenAI? Zero.

“Best” is subjective, but is part of what the held-out questions should do - be as broad as the training and the types of user input actually seen. Train 50x ten question types, you’ll get a model good at answering ten questions.

Ok thanks.
But is this to the price of 1? If I fine tune an already fine tuned model, don’t I create a new model, so I have to pay for another one.
Basically to find the best number of epochs is trial and error. I can fine tune a model and then seeing whether it improves or not.

You pay for the tokens of fine-tune input, multiplied by the number of epochs you run.

cost of fine tune:

100k x 8 = 800k tokens (original)
+
100k x 8 = 800k tokens (continuation model)
=
1600k tokens

Same as a 100k x 16 epoch single job.

There was a thread on old models “why aren’t these the same”, but fine-tune continuation on the new models was only enabled a few days ago.

You could be the first to make 16 models at 1 increasing epoch each and rate their performance on a test set!

Okay, so it is better training 2 models by improving their epochs.
But it is not the same of having a model of 16 epochs directly, I would have a model trained on 8 epochs and then improving it of other 8. Is this the same of having a model of 16 epochs?
So, I cannot see the optimal number of epochs of my model, like with early stopping rounds, but only by adding some more or removing them. For example, in my case improving them helped a lot.
In general, the more the data, the better, right?

I see you are asking whether a model training of 8 epochs, and then a continuing tune of 8 epochs more, would instill the same strength of training as a single run of 16 epochs of machine learning reinforcement using the fine-tune endpoint to create your own models.

It should be the same as 16 epochs in one go. However, parameters no longer available to you now may be auto-tuned by OpenAI, based on the input, changing the results of cumulative but separate runs.

On previous models, it was reported in one thread that someone got differing results. Not a lot of evidence, and no evidence on new models.

The unseen learning parameters may be tweaked based on the total size of one job, so that 10 questions still has some effect compared to 100,000 questions that would be great training coverage. 8+8 could be stronger than 16 in such a scenario, or you’d get different results if you just repeat everything 16 times in your file.

Better is varied data, to cover more variations of what a user might type in, rather than just repeating the same tokens for the same cost.

Another possibility is for those that use a validation file. Train another continuation model using the held-out validations so the cost of expensive data preparation doesn’t go to waste.

And to know the best epochs there is no way?
By the way, many many thanks for your effort.

It depends on how much of an idiot savant Rainman you want to turn the AI into, how many epochs is ideal for your application.

1 Like

I would like to answer your second question first. (sorry for my lousy english :). )

Step meaning is: In machine learning, “Step” typically refers to an iteration or update of the model’s weights during training. It’s related to the optimization algorithm (e.g., stochastic gradient descent). You can customize the training steps by adjusting the learning rate, batch size, and other hyperparameters, but it’s often not directly referred to as “step” in most ML frameworks.

Turkish version where I explain it better:

Makine öğreniminde “Adım” genellikle eğitim sırasında modelin ağırlıklarının yinelenmesini veya güncellenmesini ifade eder. Optimizasyon algoritmasıyla ilgilidir (örn. stokastik gradyan inişi). Öğrenme hızını, toplu iş boyutunu ve diğer hiper parametreleri ayarlayarak eğitim adımlarını özelleştirebilirsiniz ancak çoğu makine öğrenimi çerçevesinde buna genellikle doğrudan “adım” adı verilmez.

I hope I understood your problem correctly and gave the correct answer, thank you.

2 Likes

I had about 150+ examples that trained the model to generate a specific form of JSON. By default, the no. of epochs was 3 and the results were not correct, however increasing the no. of epochs to 6 made the model almost accurate.

1 Like