Iterative Fine-Tuning (1 epoch) vs One-Time Tuning (n epochs)

rex.vanhorn · March 10, 2023, 8:44pm

Hi, everyone.
I am researching GPT-3’s ability to answer abstract, subjective questions from a specific perspective. The idea is that I want to build an answerbot that can take subjective questions in accordance with the data on which it was fine-tuned, which may or may not comport with the data on which GPT-3 was fine-tuned. Imagine GPT-3 was pretrained to respond that the Philidelphia Eagles are the best NFL team, and I want GPT-3 to respond that the Raiders are the best NFL team. (That’s not the research, but that’s the idea.)

I’ve noticed that fine-tuning in one pass, e.g., over 10 epochs, tends to have a greater effect on the underlying weights than iteratively fine-tuning one epoch at a time, even over the same number of training passes. Using my example above, if I fine-tune on my dataset epoch at a time, it takes 15 iterations to achieve the desired output of the “The Raiders are the best.” However, if I fine-tune in one request, setting n_epochs=10, then I get the desired output of “The Raiders are the best.” after just these 10 training epochs (in one request).

Why is that?

PS - I get that this is a dumb thing to do in practice. I only performed this experiment to show the migration of output completions over training rounds and noticed this anomaly.

Additional information:
I use this file to fine-tune the same model ten times, each time with 1 epoch.

Here are the results when I submit the prompt: Who is the handsomest man on the planet Earth?

Why does it take 15 individual fine-tunes to achieve the same result as fine-tuning once over 10 epochs?

reeti · September 5, 2023, 6:29am

I am confused about what you were trying, were you trying to train GPT-3 on same dataset by calling individual fine tuning calls or were you trying to increase dataset and/or change dataset in different calls. Because if you are doing former, i.e., trying to fine tune gpt-3 by calling multiple fine tune calls then you are doing it all wrong.

According to documentation, fine tuning a fine tuned model is not possible. So you making 15 fine tune calls are all starting from start and not continuing from previous call. You can understand this by seeing the model you are passing in fine tune argument, this model contains the weight you are trying to fine tune over, and as you know that you are passing this model as argument, it takes the same weight every time.

_j · September 5, 2023, 9:00am

The prior fine-tune endpoint allows one to continue fine-tuning an existing model.

Each epoch deepens the weights of the reinforcement done by the fine-tune pass.

The rest of the response is equally uninformed, not helped by OpenAI removing old superior documentation and even blocking the archive.org wayback machine.

rex.vanhorn · September 8, 2023, 4:45pm

Hi, @reeti
It is the former, but only to ascertain the effects of iterative training (i.e., one epoch, 10 times) vs 10 epochs at one time.

By the way, you can fine-tune a fine-tuned model. You perform the same steps, though you give the name of the fine-tuned model in place of the generic model. I’ve done it, and it works pretty well - as well as fine-tuning does work for my purposes.

rex.vanhorn · September 8, 2023, 4:47pm

I’ll add that this is complete curiosity. I just want to understand what is fundamentally different between fine-tuning with 1 epoch 15 times vs 15 epochs 1 time.
(Ultimately it just seems to save money fine-tuning more epochs fewer times)

_j · September 8, 2023, 5:09pm

Or - the difference between repeating your examples in your training file fifteen times?

The “continuing training your model” idea is on hold, as the new model endpoint doesn’t offer it.

nphat44444 · July 30, 2024, 8:43am

Thank you for your continued support.

I am writing to inquire about a specific issue you encountered while configuring the hyperparameters for fine-tuning the GPT-4o mini model. You aim to use a dataset containing approximately 2000 question/answer pairs, and our training configuration includes parameters such as epoch = 50 and batch size = 3.

However, you have noticed that the platform restricts the number of epochs to a maximum of 20. Could you please provide clarification on why there is this limitation?

Capture

morgan.newman · July 30, 2024, 4:00pm

I am very interested in fine tuning, will be playing with this soon so reading through this was helpful.

I am still confused about who the most handsome on the planet is. I was under the assumption Morgan was the most handsome. Let us know when you get to the bottom of this fundamental question, please!

Topic		Replies	Views
How many Epochs for fine-tunes? API	7	15051	December 28, 2023
What does fine-tuning do? API fine-tuning	5	1519	February 7, 2024
Can you iteratively train a fine-tune model? API	14	3562	September 20, 2024
Questions about fine-tuning GPT-3.5-turbo API fine-tuning	1	2077	October 29, 2023
What exactly and technically happens with fine-tuning? API	10	5289	January 3, 2024

Iterative Fine-Tuning (1 epoch) vs One-Time Tuning (n epochs)

Related topics