I’m trying to fine-tune davinci to write multiple choice questions following a certain structure, tone and style. But after fine-tuning on the training data, the model fails to consistently generate new multiple choice questions matching even the basic structure of the training samples: a problem statement and 5 response options.
The dataset is 400 samples consisting of
"prompt": "Write a multiple choice question about ... \n\n###\n\n",
"completion": " <full problem statement> ###"
I fine-tuned for 2 epochs, no learning rate specified. I have tried fine-tuning with learning rates between 0.2 and 1, and all models fail to write questions following the structure in the training data.
The model also seems to have catastrophically forgotten how to answer questions. For instance, if you ask text-davinci-003
“What is the capital of France?” you typically get the completion Paris.
. However, the fine-tuned models complete:
What is the capital of France?
What is the capital of France?
or something similarly repetitive.
I have experimented with different temperatures, max_tokens, and frequency/presence penalties with no improvement. Are there any best practices for fine tuning the model so it learns to complete following its training examples when prompted, without catastrophically forgetting other things it used to be good at? I have already read through the entirety of the documentation here. Thanks!