I see examples on finetuning use system prompt with GPT-3.5. I also seen comments that a generic system prompt would not be necessary, because the model would learn from the finetuning data.
Has anyone tested, even in small scale, if the system prompt should be included or it is simply extra cost?
Think of something like this “You are an accurate translator. When users says something, return exactly the same text in English”. Should I repeat this in my finetuning dataset to give it an initial hint on what is expected, or should I skip it?
As I understand, if I use system prompt in training data, I also need to use it when doing completions, and it may lead to slight increase in costs, but perhaps only a few percentage. I may also give a much longer prompt, giving instructions on the exact localization I prefer, formatting etc. so it may be in ~10% of extra cost range.
I mean finetuning of gpt-3.5-turbo: https://platform.openai.com/docs/guides/fine-tuning I know I used slightly wrong term. It is not exactly “training a model” but finetuning the output for format and style based on my examples.
Based on some experimentation, I think the answer is: yes, we should repeat the system prompt in the dataset. That means we must use it when consuming the dataset, although we can also slightly change it for slightly altered purposes.
The reason is that the system prompt helps the model generate almost correct answer, and then finetuning has less work to adjust the response.
If the training dataset is big, then included the system prompt is probably unnecessary.
I have noticed that if you include a system message that if you prompt with a system prompt that is significantly different, then the desired behavior breaks down. Is there a particular strategy to make a model generalize to any novel system prompt, while maintaining the correct behavior motivated by the fine-tune? I have tried a lot of fine-tune experiments so feel free to ask me any questions too.
Strategy 1: have the resources of OpenAI to train chat models.
It is a good application to try to restore some quality of system instruction following. If you can fine tune on varying user instructions, then there is not much difference if you break the barrier and are also including system instructions in the instruction-following.
However, I fear that to generalize, beyond a subset of varying instructions placed in to that set system prompt of your application, would require a massive training set, and training in one domain could degrade another, even if you are only varying between “you talk like my uncle”, “you talk like my aunt”, “you respond in Shakespeare’s early Modern English”.
Hi guys I know it was solved, just saying what my experience showed so far:
When you do fine-tuning on a single task which is clearly defined, then system prompt can be reduced from the complete instructions on how to achieve the task to simple task name or even none in some cases.
On the other hand when you fine tune for multiple tasks and the model needs to know how to do the particular task or some context is presented in a system message in this case yes you need to keep the system message as it is required for the model to achieve the task despite the fine tuning.