Hi there!
I was going through the fine-tuning docs here: https://platform.openai.com/docs/guides/fine-tuning
The following was not clear to me.
If I give a system role message in my training data, my expectation is that I don’t have to give that information at inference (so that I can save costs on tokens)
There is a line in the doc that hints at this
If you would like to shorten the instructions or prompts that are repeated in every example to save costs, keep in mind that the model will likely behave as if those instructions were included, and it may be hard to get the model to ignore those “baked-in” instructions at inference time.
But my experience with fine-tuning is suggesting otherwise. The model only behaves similar to training set if I provide the system prompt.
I see the same with most youtube tutorials as well.
So what are my options?
Do I need to provide the same system prompt always during inference of a fine-tuned model?
If so, what have I actually achieved with fine-tuning?
Better replication of training examples?
Ideally not. Anything you fine-tune gets “baked in”.
For example, all OpenAI models (AFAIK) have a “baked-in” system prompt of “You are a helpful assistant”. You can choose to omit this system message.
If you are finding a difference then you can decide to include the message, and/or continue fine-tuning the model with more epochs or more training data.
You can then include some evaluation/validation data that doesn’t include the system prompt to see how it manages.
In the case you are quoting the benefit is for example if you had a massive prompt. So if your prompt included 100 examples, or a lot of instructions. If the cost to “token-stuff” is more than the cost to fine-tune, or bake it in, then you can go the fine-tuning route and have cheaper, more reliable results.
Based on my own fine-tuning experience I can reiterate that you must still include the system prompt at inference, otherwise the model will not exhibit the desired behaviour.
While this may appear counterintuitive, there still are other benefits:
You should get better and more consistent model performance relative to the base model.
Typically, for specialized tasks you’d have to include one or multiple examples in your prompt when using the base model. This of course becomes redundant when using a fine-tuned model. Hence, there are token savings in this regard.