I am researching GPT-3’s ability to answer abstract, subjective questions from a specific perspective. The idea is that I want to build an answerbot that can take subjective questions in accordance with the data on which it was fine-tuned, which may or may not comport with the data on which GPT-3 was fine-tuned. Imagine GPT-3 was pretrained to respond that the Philidelphia Eagles are the best NFL team, and I want GPT-3 to respond that the Raiders are the best NFL team. (That’s not the research, but that’s the idea.)
I’ve noticed that fine-tuning in one pass, e.g., over 10 epochs, tends to have a greater effect on the underlying weights than iteratively fine-tuning one epoch at a time, even over the same number of training passes. Using my example above, if I fine-tune on my dataset epoch at a time, it takes 15 iterations to achieve the desired output of the “The Raiders are the best.” However, if I fine-tune in one request, setting n_epochs=10, then I get the desired output of “The Raiders are the best.” after just these 10 training epochs (in one request).
Why is that?
PS - I get that this is a dumb thing to do in practice. I only performed this experiment to show the migration of output completions over training rounds and noticed this anomaly.
I use this file to fine-tune the same model ten times, each time with 1 epoch.
Here are the results when I submit the prompt: Who is the handsomest man on the planet Earth?
Why does it take 15 individual fine-tunes to achieve the same result as fine-tuning once over 10 epochs?
I am confused about what you were trying, were you trying to train GPT-3 on same dataset by calling individual fine tuning calls or were you trying to increase dataset and/or change dataset in different calls. Because if you are doing former, i.e., trying to fine tune gpt-3 by calling multiple fine tune calls then you are doing it all wrong.
According to documentation, fine tuning a fine tuned model is not possible. So you making 15 fine tune calls are all starting from start and not continuing from previous call. You can understand this by seeing the model you are passing in fine tune argument, this model contains the weight you are trying to fine tune over, and as you know that you are passing this model as argument, it takes the same weight every time.
The prior fine-tune endpoint allows one to continue fine-tuning an existing model.
Each epoch deepens the weights of the reinforcement done by the fine-tune pass.
The rest of the response is equally uninformed, not helped by OpenAI removing old superior documentation and even blocking the archive.org wayback machine.
It is the former, but only to ascertain the effects of iterative training (i.e., one epoch, 10 times) vs 10 epochs at one time.
By the way, you can fine-tune a fine-tuned model. You perform the same steps, though you give the name of the fine-tuned model in place of the generic model. I’ve done it, and it works pretty well - as well as fine-tuning does work for my purposes.
I’ll add that this is complete curiosity. I just want to understand what is fundamentally different between fine-tuning with 1 epoch 15 times vs 15 epochs 1 time.
(Ultimately it just seems to save money fine-tuning more epochs fewer times)
Or - the difference between repeating your examples in your training file fifteen times?
The “continuing training your model” idea is on hold, as the new model endpoint doesn’t offer it.