Finetuning Breaks Engineered Prompts

I’ve been testing various methods of finetuning GPT3 but I can’t seem to get it to “learn” a new topic without breaking the prompts that I’ve engineered to achieve specific results with the base models.

I’ve tried the following:

  • Curie and Davinci models
  • Varied number of epochs
  • Varied learning_rate_multiplier
  • Training with 500+ documents
  • Training with HTML, plain text and prefixed text
  • With and without prompt and stop.

Does anyone have any how it’s possible to fine-tune in a way that simply adds to the “knowledge” of GPT3, rather than breaking the way it understands prompts and queries?

I was wondering if there’s a specific format I should feed it articles? Perhaps in the same format it was originally trained?


Finetuning is generally meant to increase consistency in performance. There are no documented methods of using finetuning to expand how much GPT-3 knows.

You may want to treat this as a search problem instead.


Thanks for your input, David. Your videos on YT have been extremely helpful in understanding the best way to prompt GPT3.

Would you mind elaborating on this? What exactly do you mean by search problem?

I have seen a suggestion to leave the prompt empty and fill the completions with the new information you would want to add to the GPT-3. (Open Ended Generation) Not sure if I don’t have enough samples, or that the quality of the data is not good enough, but I did not have much success with that.

Currently I am trying to have GPT-3 generate question and then use GPT-3 answer those same questions based on the original content. Then use the question and answer in the fine-tuning job to generate a jsonl file for the fine tuning.

I think this method does lose information since the original content is no longer being used in the fine-tuning, but if you get the right questions, you may get enough information that allows you to ask new questions.

The other thing is that this can get expensive, since you are using GPT-3 to generate the questions, the answers (both with the original content in the prompt) and then the fine-tuning job (with only the question and answers)…


Yes, I’ve tried multiple times with empty prompts but it still seems to overfit the training data even with a single epoch.

Could you share an example of what you’d like to add?

In the OpenAI documentation (OpenAI API) it describes how fine-tuning can be used to “create a model which will be a lot better at understanding … content from your own domain”. Does this feature actually not work so well in practice?

Hi Luke, I’ve simply been using full articles for finetuning. I’ve tried them as plain text, with newlines separating paragraphs, and in other formats such as HTML, XML but in each case the fine-tuning breaks prompts which work wonderfully with the original model.

Could it be perhaps that fine-tuning with large amounts of text (full articles of 500+ words) is causing issues? I couldn’t find any info on how GPT-3 was originally trained to try to match it.