Iterative Fine-Tuning (1 epoch) vs One-Time Tuning (n epochs)

Hi, everyone.
I am researching GPT-3’s ability to answer abstract, subjective questions from a specific perspective. The idea is that I want to build an answerbot that can take subjective questions in accordance with the data on which it was fine-tuned, which may or may not comport with the data on which GPT-3 was fine-tuned. Imagine GPT-3 was pretrained to respond that the Philidelphia Eagles are the best NFL team, and I want GPT-3 to respond that the Raiders are the best NFL team. (That’s not the research, but that’s the idea.)

I’ve noticed that fine-tuning in one pass, e.g., over 10 epochs, tends to have a greater effect on the underlying weights than iteratively fine-tuning one epoch at a time, even over the same number of training passes. Using my example above, if I fine-tune on my dataset epoch at a time, it takes 15 iterations to achieve the desired output of the “The Raiders are the best.” However, if I fine-tune in one request, setting n_epochs=10, then I get the desired output of “The Raiders are the best.” after just these 10 training epochs (in one request).

Why is that?

PS - I get that this is a dumb thing to do in practice. I only performed this experiment to show the migration of output completions over training rounds and noticed this anomaly.

Additional information:
I use this file to fine-tune the same model ten times, each time with 1 epoch.

Here are the results when I submit the prompt: Who is the handsomest man on the planet Earth?

Why does it take 15 individual fine-tunes to achieve the same result as fine-tuning once over 10 epochs?


This post was flagged by the community and is temporarily hidden.