Davinci not learning new patterns after fine-tuning, forgetting to answer questions

I’m trying to fine-tune davinci to write multiple choice questions following a certain structure, tone and style. But after fine-tuning on the training data, the model fails to consistently generate new multiple choice questions matching even the basic structure of the training samples: a problem statement and 5 response options.

The dataset is 400 samples consisting of

"prompt": "Write a multiple choice question about ... \n\n###\n\n", 
"completion": " <full problem statement> ###"

I fine-tuned for 2 epochs, no learning rate specified. I have tried fine-tuning with learning rates between 0.2 and 1, and all models fail to write questions following the structure in the training data.

The model also seems to have catastrophically forgotten how to answer questions. For instance, if you ask text-davinci-003 “What is the capital of France?” you typically get the completion Paris.. However, the fine-tuned models complete:

What is the capital of France?

What is the capital of France?

or something similarly repetitive.

I have experimented with different temperatures, max_tokens, and frequency/presence penalties with no improvement. Are there any best practices for fine tuning the model so it learns to complete following its training examples when prompted, without catastrophically forgetting other things it used to be good at? I have already read through the entirety of the documentation here. Thanks!

2 Likes

Hi elmstedt! Yes all the training completions have the same basic structure of question statement followed by 5 response options, and all prompts and completions in the dataset have been preprocessed with separators like in the example, and I do use that separator when asking the fine-tuned model for completions. I can try more epochs though!

1 Like

elmstedt is right, 400 is not enough, try more than 750

This link also might be helpful > Fine-tuning a Classifier to Improve Truthfulness | OpenAI Help Center

2 Likes

Great minds! I’m trying GPT4 to do data augmentation and it’s looking pretty good

1 Like

So if you using synthetic GPT-4 generated sample, you may also need to consider this, Aligning language models to follow instructions , what you are doing is supervised fine-tuning

2 Likes

After checking the documentation again I noticed that the models currently available for fine-tuning are pre-InstructGPT models. So it wasn’t catastrophically forgetting QA, the models had never learned it! Thanks for all the help.

1 Like