I’ve been exploring OpenAI for the past month and feel like I’m getting the hang of the API and prompt engineering, but hyperparameters for training models still remain a bit of a mystery to me.
What the documentation says about n_epochs
n_epochs integer Optional Defaults to 4
The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.
From what I’ve seen, 4 epochs doesn’t necessarily train a model to consistently get 100% for the accuracy measures, but doubling or tripling the number of epochs does in some cases improve accuracy. What I would like to understand is how we can estimate a good number of epochs, perhaps based on an initial 4 epochs’ training job’s results file. If you graph your improvements over the training steps, you can see where the accuracy starts reaching 100% and try and guess/estimate a good number of epochs for your particular model and training data set.
Does anybody perhaps know how to calculate such an estimation?
My understanding is that fine-tuning is about adjusting weights, so the same data is just recursively updated, which will mean the size of a fine-tune aught to remain the same regardless of the number of epochs it’s trained for. If that is the case, n_epochs will not effect performance with regards to loading to memory and the time it takes to generate responses, but the quality of responses may be effected…
I am worried about “overfitting”, which is another data science concept I don’t understand very well. My concern would be that if the model performs 100% accurate on a given validation set, it might not be as flexible when given completely new data. I suppose this is where “Temperature” or “Top P” might play a role, and the only way to determine if any of this is true for a particular use-case is to experiment with it.
If you leave out the hyperparameters, we try to figure out the best parameters for your type of job. We picked a versatile set of parameters which is very resilient across a large number of cases.
Increasing the dataset size will make a much bigger difference, than tinkering with the hyperparameters. My advice would be to leave the epochs at 4, unless you have a very small dataset.
The reason we left the hyperparameters in there are in case you have a very strong reason to change them, or if you’re doing academic research where you’re trying to test a particular hypothesis.
Thank you very much for your response @boris, I will definitely be adding to my training data massively, my current experiments have been done on small sample sets of data for the sake of understanding how the platform and technology works.
I feel like the training file is something I have full understanding and control over, but when it comes to the training, it’s out of my hands, I know little about how it works and am left to make assumptions. I can add tons of data and do a lot in terms of ensure the data is of good quality. The reason for my post is that I would also like to understand and apply best practices when it comes to the actual training processes.
So, 4 epochs is a good number for the majority of use cases, does this mean:
The improved accuracy I observed is actually a bad thing, it overfits the model.
The improvements make such a small difference that they’re regarded as insignificant.
Four epochs is “good enough” for people who don’t understand machine learning.
I know this post is quite old, but is this statement true, @boris? Does fine-tuning not add new information to the model? Or can you add new vocabulary, etc. through the fine-tuning process?
You can definitely add new vocabulary and facts via fine-tuning, this post was more to try and establish whether training for more than 4 epochs is good for the model, in other words, would training beyond 4 epochs improve accuracy or destroy it due to overfitting. My two cents worth, the datascientists I work with and myself have seen differences in model performance based on setting the fine tune epochs number higher and lower, and the ideal number of epochs is different for different models, so even if this factor has a small impact, if your system is 96% accurate and you can do something to push that up by just 1 or 2%, then something small like the number of epochs your training with does matter.
Got it. That’s some really useful information. What I’m doing is… a bit different, and is actually the exact opposite of what GPT is designed to do, so it might fail. But it’ll be a fun experiment if I can at least try it out. Wish openai had a grant program for those interested in researching interesting ways to use the technology.