Do the system messages in GPT 3.5 Turbo fine-tuning need to be the same for all entries?

noor.khalifa · November 2, 2023, 8:32am

I am using GPT 3.5 Turbo to improve the English-Arabic translation I get from another API. And I have a set of guidelines that I need the LLM to fix in the Arabic translation. I tried putting the guidelines as prompts in the system message but the performance was poor and the output didn’t adhere to my guidelines, so I am now trying fine-tuning with around 100 examples.

My question is, do all entries have to have the same system message or can they be different for each of the 100 lines? And will this help the LLM understand the guidelines or not?

The first part will always be the same, but I want to check whether explaining what needs to be done will help the model or not.

for example:

entry 1:

{“role”: “system”, “content”: “You are a bot that improves English to Arabic translations. You will get a paragraph in English and its translation in Arabic. You need to improve the Arabic translation by comparing it to the original English text and by performing some improvements. If you see a number in text without parenthesis in the English text like ‘10 persons’, make sure to change it to Eastern Arabic numerals in the Arabic text to ‘١٠ أشخاص’ instead of ‘10 أشخاص’ ”}

entry 2:

{“role”: “system”, “content”: “You are a bot that improves English to Arabic translations. You will get a paragraph in English and its translation in Arabic. You need to improve the Arabic translation by comparing it to the original English text and by performing some improvements. If you see a year between parenthesis in the English text like '(2012), make sure the Arabic text has it exactly the same like '(2012)”}

_j · November 2, 2023, 8:42am

You essentially create two different “modes” of the AI that can be called upon when the same system message is used again. There will be inference between them because of the similarity, so I would expect some of the “number” qualities to be learned in the “date” system prompt usage.

The combination of both providing the prompt that states the task that underperforms in a system role and also a fine-tune system train should give strong reinforcement.

However I will give you a tip to save a lot of money if this is not a user-facing chatbot: put your data processing instructions and guidance into the user role along with other preface before text to translate. The gpt-3.5-turbo has been degraded to where it won’t follow system role tasks or operation modes, but, like ChatGPT, it will do what the user instructs.

noor.khalifa · November 2, 2023, 9:46am

thank you for your reply.

so just to get things clear, is option 1 better than option 2? keeping in mind that the instructions in the system message will be different for the 100 JSONL lines depending on the guideline. Also, keeping in mind that in the actual system message in the ChatCompletion, the system message will be the general one (i.e., You are a bot that improves English to Arabic translations. You will get a paragraph in English and its translation in Arabic. You need to improve the Arabic translation by comparing it to the original English text and by performing some improvements.) since the bot might be given texts that need multiple things to be fixed.

Option 1:

{“role”: “system”, “content”: “You are a bot that improves English to Arabic translations. You will get a paragraph in English and its translation in Arabic. You need to improve the Arabic translation by comparing it to the original English text and by performing some improvements. If you see a year between parenthesis in the English text like '(2012), make sure the Arabic text has it exactly the same like '(2012)”}

Option 2:

{“role”: “system”, “content”: “You are a bot that improves English to Arabic translations. You will get a paragraph in English and its translation in Arabic. You need to improve the Arabic translation by comparing it to the original English text and by performing some improvements.”}

_j · November 2, 2023, 9:59am

I like option 2. Also give your new bot a name and introduction sequence so that your fine tune is definitely “triggered”. You can fine-tune on the AI successfully completing a variety of user tasks in the correct manner, and it should be able to infer the type of output needed and you can continue to user-prompt when instruction is still needed. The quantity of user/assistant coverage may need to be significantly higher to have an optimum AI.

noor.khalifa · November 2, 2023, 10:14am

what is an example of an introduction sequence?

_j · November 2, 2023, 10:30am

“You are ChatGPT, a large language model trained by OpenAI…” would be an example of an identity that gives a firm root in the behavior. Or if one needs a specialist, so that the probable output becomes more like what you’ve trained on, break away from that immediately, “Here is ArabicAI, an expert AI assistant which has been fine-tuned to improve the quality of modern Arabic writing provided within a user role message.”

neomadin · December 4, 2023, 5:02am

Did you ever figure this out? I’ve been curious of the same…

In addition, I’m curious on how necessary it is to use the same system message in production that was used in training…

Most examples I see for FT use the same System message in every FT example in the dataset… Does this mean once the model is FT’d that that portion of the system message is no longer needed, as it’s essentially baked in? On the flip side, if it is needed, then can you append to the System message to include more directions that weren’t necessarily the focus of the FT job and still reap the enhancements from the FT’d model?

Otherwise, it would suggest that you must always use the exact same System message in production as was used in the examples.

flp · October 2, 2024, 9:39pm

The exact same questions. I’ve scraped lots of articles and it’s still not clear to me.

I’m using an Assistant, that I configured through the OpenAI dashboard, where I added very long “Instructions”.
I have a very specific use-case so I need it to clasify the “User” input into one of 10 predefined categories (which are detailed and explained in this “Assistant Instructions”.

I have some examples that I want to use to fine-tune a model to be used by the assistant.

Do I need to provide the “Assisstant Instructions” with every example?

{“messages”: [{“role”: “system”, “content”: “VERY LONG Assisstant Instructions”}, {“role”: “user”, “content”: “TEXT A”}, {“role”: “assistant”, “content”: “1”}]}
{“messages”: [{“role”: “system”, “content”: “VERY LONG Assisstant Instructions”}, {“role”: “user”, “content”: “TEXT B”}, {“role”: “assistant”, “content”: “2”}]}

If I provide the “Assistant Instructions” with every example, can I later remove those instructions from the Assistant configuration completely?
If I don’t provide them with every example, are they inferred from the Assistant configuration?

_j · October 3, 2024, 1:27am

Fine-tuning a language model by including extensive “Assistant Instructions” with every training example, and then later removing those instructions from the assistant’s configuration, is not an effective strategy. The purpose of fine-tuning is to help the model generalize desired behaviors from the training data to new, unseen inputs without relying on explicit cues present during training.

When you provide detailed instructions with each example, the model learns to associate the desired behavior specifically with the presence of those instructions. Consequently, if you remove the instructions after fine-tuning, the model may not exhibit the expected behavior because it hasn’t learned to generalize beyond the context of those instructions.

A more effective approach is to supply minimal system instructions that establish the assistant’s role or purpose. This sets a general behavioral framework for the model. Then, through a variety of examples, the model learns to infer and generate appropriate responses based on the input alone. This method enables the model to apply the learned behaviors to new situations without requiring the exact instructions used during training.

Including lengthy instructions in the training data can be beneficial when the model needs explicit guidance to understand complex tasks. However, if these instructions are not intended to be part of the model’s operational input, they should not be ingrained into the training process. Otherwise, the model becomes dependent on signals that won’t be present during actual use, leading to suboptimal performance.

Topic		Replies	Views
Can I fine-tune the model without the prompt and answer for the "system" role? API gpt-35-turbo , chatgpt , api	12	6348	January 29, 2024
"System" message after fine-tuning GPT builders	3	335	September 12, 2024
System Prompt in Dataset (Fine-Tuning) or Assistants API API fine-tuning , gpt-4o-mini	15	793	December 12, 2024
GPT3.5 Fine Tuning System Message API fine-tuning	1	1585	December 4, 2023
Fine-tuning 3.5Turbo (1106) questions API fine-tuning , api	6	701	January 17, 2024

Do the system messages in GPT 3.5 Turbo fine-tuning need to be the same for all entries?

Related topics