Adding prompt info to fine-tuning

Hi folks
I’m trying a couple of experiments to get the hang of fine-tuning.
I’m building a JSONL with samples of questions and answers, but I’d also like to give the model the right context and instructions, in order to not have to repeat that with every API call.
So I started the json with an empty prompt

{"prompt" : "", "completion":"I'm a chat Bot that......"}
{"prompt" : "Question 1 text", "completion" : "answer 1 text"}

But so far it seems the instructions are not so considered as when I include them directly in the prompt.

Any thoughts on this? Is it possible at all or better to avoid mixing different things in the JSONL?


If you have a few hundred examples, then instructions won’t make any difference. I suggest you use the following format:

{“prompt” : “<question_1_text> \nAnswer:”, “completion” : " <answer_1_text> \n"}
{“prompt” : “<question_2_text> \nAnswer:”, “completion” : " <answer_2_text> \n"}

1 Like

Thanks @boris ,
Does that mean that if I manually include instructions in the prompt, it also won’t make a difference?
I’m just trying to get some ground rules right. For instance, I’m trying that it avoids answering questions on “hot” topics (like sex, religion, politics, violence), and will instead point to get the information else.
With what you are saying, am I understanding correctly that if I include examples of these, then I can skip the prompt instructions?

The key is you need a demarcation token of some sort, and you also just need to be consistent with the format/structure. Every single prompt needs to end with the same keyword so the model knows when the prompt ends and the completion begins.

Alternatively, if you just want to do completions, you still must still with the same format. For instance, for a chatbot, you probably want to include the name of the speaker in the information. Check out the Cornell Movie Dialog Database for an example.

1 Like

Yes thats what I am doing. Its basically like

{"prompt": "Q: What is the sun?\nA:", "completion": " The sun is a giant ball of fire." }

then when I call the API I use the text “Q: What is a cloud?\nA:”
with “\n” as a stop

1 Like

With enough finetuning data, you shouldn’t need a stop. At least I haven’t needed it. I also trim both the prompt and completion of all whitespace. Not sure if either of those things make a difference, but my Question Generator and Core Objective Function finetune models work well enough.

But yeah the Q: and A: is a good set of tags. You might not even need the Q: to be honest, at least not if every prompt is going to be a question. If you have multiple evaluations like T/F, Q/A, and such, then you will want to keep the leading Q:

I was also trimming whitespaces but then the openai tool suggested to add them in before the completion, so I did.

Space at the beginning of a completion improves tokenization, and generally the performance slightly.

I’d add \n at the end of a completion as a stop sequence, if you care about the precise length of answers. If most answers are just one sentence long, then you probably don’t need it.


@daveshapautomator What is that? Is it available somewhere?

This morning I was thinking about building a question generator for fine tuning.

Here ya go.


I have read this in a number of places. What is the reason for this “behaviour”?

I have a follow-up question about this. If I have ~20 examples (a small number, I know) is it useful to include a prefix of “Here is a summary of an award winning story about:” before each? For example:

{"prompt": "Here is a summary of an award winning story about: topic1, topic2, topic3", "completion": "..."}
{"prompt": "Here is a summary of an award winning story about: topic7, topic8, topic9", "completion": "..."}

In other words, is it useful to include the stylistic intent of the output in the prompt when finetuning? Or does the style of the completion examples need to do the heavy lifting by simply having the desired style? If so, would a better finetuning dataset look like the following:

{"prompt": "topic1, topic2, topic3", "completion": "<ideal output for given topics>"}
{"prompt": "topic7, topic8, topic9", "completion": "<ideal output for given topics>"}

If I have a small number of examples, do I need to structure the prompts the same way I would when using Playground by telling it to be “award winning” in style? As my number of samples increases, then I could start to remove the intent language from the prompt?

Thanks! in my own experiments I have found that adding instructions to the prompt for finetune data can act as a good substitute for training data. If the instruction is accurate, it will increase the conditional probability of your completion and can improve learning, and steerability during inference. As training examples increase, fixed instructions are probably less relevant.


You’re spot on. If you have <100 examples then extra descriptions would help. After that it makes little difference. With only 20 examples you may not achieve a much better performance than with few-shot approach, but it depends on the application.