N_epochs has no apparent effect in new finetuning API

I am trying to use the new finetuning API to train new babbage-002 models to replace my existing ada models, but am seeing a decrease in performance. The finetuning process is also “successfully” completing much earlier than it did when using the legacy API and there is no mention of epochs in the retrievable info, making me think that the finetune completed after only 1 epoch. I tried to adjust the number of epochs by setting the n_epochs parameter in the create method, but the reported number of training steps, training time and subsequent model performance does not change regardless of what value I use.

Is there some early stopping mechanism at work here or am I doing something wrong?

I am starting the finetune with:

openai.FineTuningJob.create(
    training_file=train_file_id, 
    model='babbage-002',
    hyperparameters={'n_epochs': 4} # This doesn't seem to have any effect
)
1 Like

pip Update to the latest openai library module to ensure features and bug fixes.

You can retrieve the jobs objects, and see if the n_epochs you have specified have been recorded as a hyperparameter.

API Reference - List fine tuning jobs

Epochs is the number of passes through your training file. “their choice” is three - far from the 8 or 16 one might have previously used for a single-purpose machine. I expect one could get the same effect of doubling the epochs by just repeating all examples, for a file twice as big.

Are you trying to use ChatML messages, or are you using the previous completions prompt training format? babbage-002 being a completions model, gives you a blank slate and should have no idea about even the message format that is a needless encumbrance of confusing tokens. They have made zero documentation on the replacement completion models. I can tell you that they have high perplexity, likely being made more efficient through internal reductions.

I also have a big performance decrease going from the legacy models to babbage-002. For me changing the epochs does make a difference in the result file even though it is not as verbose as for the legacy training. I doesn’t make sense but for me going from 2 to 4 epochs changes the step count from ~1500 to ~2230.
Sadly there are not many options to optimize the results now. I currently try to remove the stop sequence and the extra space before the completion since this is no longer described in the new guide.

I will add an example to the docs with hyperparams enabled. You should be able to retrieve a fine-tuning job and see the results in the response object: OpenAI Platform

4 Likes