Hi, everyone.
I am researching GPT-3’s ability to answer abstract, subjective questions from a specific perspective. The idea is that I want to build an answerbot that can take subjective questions in accordance with the data on which it was fine-tuned, which may or may not comport with the data on which GPT-3 was fine-tuned. Imagine GPT-3 was pretrained to respond that the Philidelphia Eagles are the best NFL team, and I want GPT-3 to respond that the Raiders are the best NFL team. (That’s not the research, but that’s the idea.)
I’ve noticed that fine-tuning in one pass, e.g., over 10 epochs, tends to have a greater effect on the underlying weights than iteratively fine-tuning one epoch at a time, even over the same number of training passes. Using my example above, if I fine-tune on my dataset epoch at a time, it takes 15 iterations to achieve the desired output of the “The Raiders are the best.” However, if I fine-tune in one request, setting n_epochs=10, then I get the desired output of “The Raiders are the best.” after just these 10 training epochs (in one request).
Why is that?
PS - I get that this is a dumb thing to do in practice. I only performed this experiment to show the migration of output completions over training rounds and noticed this anomaly.
Additional information:
I use this file to fine-tune the same model ten times, each time with 1 epoch.
I am confused about what you were trying, were you trying to train GPT-3 on same dataset by calling individual fine tuning calls or were you trying to increase dataset and/or change dataset in different calls. Because if you are doing former, i.e., trying to fine tune gpt-3 by calling multiple fine tune calls then you are doing it all wrong.
According to documentation, fine tuning a fine tuned model is not possible. So you making 15 fine tune calls are all starting from start and not continuing from previous call. You can understand this by seeing the model you are passing in fine tune argument, this model contains the weight you are trying to fine tune over, and as you know that you are passing this model as argument, it takes the same weight every time.
The prior fine-tune endpoint allows one to continue fine-tuning an existing model.
Each epoch deepens the weights of the reinforcement done by the fine-tune pass.
The rest of the response is equally uninformed, not helped by OpenAI removing old superior documentation and even blocking the archive.org wayback machine.
Hi, @reeti
It is the former, but only to ascertain the effects of iterative training (i.e., one epoch, 10 times) vs 10 epochs at one time.
By the way, you can fine-tune a fine-tuned model. You perform the same steps, though you give the name of the fine-tuned model in place of the generic model. I’ve done it, and it works pretty well - as well as fine-tuning does work for my purposes.
I’ll add that this is complete curiosity. I just want to understand what is fundamentally different between fine-tuning with 1 epoch 15 times vs 15 epochs 1 time.
(Ultimately it just seems to save money fine-tuning more epochs fewer times)
I am writing to inquire about a specific issue you encountered while configuring the hyperparameters for fine-tuning the GPT-4o mini model. You aim to use a dataset containing approximately 2000 question/answer pairs, and our training configuration includes parameters such as epoch = 50 and batch size = 3.
However, you have noticed that the platform restricts the number of epochs to a maximum of 20. Could you please provide clarification on why there is this limitation?
I am very interested in fine tuning, will be playing with this soon so reading through this was helpful.
I am still confused about who the most handsome on the planet is. I was under the assumption Morgan was the most handsome. Let us know when you get to the bottom of this fundamental question, please!