Do the system messages in GPT 3.5 Turbo fine-tuning need to be the same for all entries?

I am using GPT 3.5 Turbo to improve the English-Arabic translation I get from another API. And I have a set of guidelines that I need the LLM to fix in the Arabic translation. I tried putting the guidelines as prompts in the system message but the performance was poor and the output didn’t adhere to my guidelines, so I am now trying fine-tuning with around 100 examples.

My question is, do all entries have to have the same system message or can they be different for each of the 100 lines? And will this help the LLM understand the guidelines or not?

The first part will always be the same, but I want to check whether explaining what needs to be done will help the model or not.

for example:

entry 1:

{“role”: “system”, “content”: “You are a bot that improves English to Arabic translations. You will get a paragraph in English and its translation in Arabic. You need to improve the Arabic translation by comparing it to the original English text and by performing some improvements. If you see a number in text without parenthesis in the English text like ‘10 persons’, make sure to change it to Eastern Arabic numerals in the Arabic text to ‘١٠ أشخاص’ instead of ‘10 أشخاص’ ”}

entry 2:

{“role”: “system”, “content”: “You are a bot that improves English to Arabic translations. You will get a paragraph in English and its translation in Arabic. You need to improve the Arabic translation by comparing it to the original English text and by performing some improvements. If you see a year between parenthesis in the English text like '(2012), make sure the Arabic text has it exactly the same like '(2012)”}

You essentially create two different “modes” of the AI that can be called upon when the same system message is used again. There will be inference between them because of the similarity, so I would expect some of the “number” qualities to be learned in the “date” system prompt usage.

The combination of both providing the prompt that states the task that underperforms in a system role and also a fine-tune system train should give strong reinforcement.

However I will give you a tip to save a lot of money if this is not a user-facing chatbot: put your data processing instructions and guidance into the user role along with other preface before text to translate. The gpt-3.5-turbo has been degraded to where it won’t follow system role tasks or operation modes, but, like ChatGPT, it will do what the user instructs.

2 Likes

thank you for your reply.

so just to get things clear, is option 1 better than option 2? keeping in mind that the instructions in the system message will be different for the 100 JSONL lines depending on the guideline. Also, keeping in mind that in the actual system message in the ChatCompletion, the system message will be the general one (i.e., You are a bot that improves English to Arabic translations. You will get a paragraph in English and its translation in Arabic. You need to improve the Arabic translation by comparing it to the original English text and by performing some improvements.) since the bot might be given texts that need multiple things to be fixed.

Option 1:

{“role”: “system”, “content”: “You are a bot that improves English to Arabic translations. You will get a paragraph in English and its translation in Arabic. You need to improve the Arabic translation by comparing it to the original English text and by performing some improvements. If you see a year between parenthesis in the English text like '(2012), make sure the Arabic text has it exactly the same like '(2012)”}

Option 2:

{“role”: “system”, “content”: “You are a bot that improves English to Arabic translations. You will get a paragraph in English and its translation in Arabic. You need to improve the Arabic translation by comparing it to the original English text and by performing some improvements.”}

I like option 2. Also give your new bot a name and introduction sequence so that your fine tune is definitely “triggered”. You can fine-tune on the AI successfully completing a variety of user tasks in the correct manner, and it should be able to infer the type of output needed and you can continue to user-prompt when instruction is still needed. The quantity of user/assistant coverage may need to be significantly higher to have an optimum AI.

what is an example of an introduction sequence?

“You are ChatGPT, a large language model trained by OpenAI…” would be an example of an identity that gives a firm root in the behavior. Or if one needs a specialist, so that the probable output becomes more like what you’ve trained on, break away from that immediately, “Here is ArabicAI, an expert AI assistant which has been fine-tuned to improve the quality of modern Arabic writing provided within a user role message.”

2 Likes

Did you ever figure this out? I’ve been curious of the same…

In addition, I’m curious on how necessary it is to use the same system message in production that was used in training…

Most examples I see for FT use the same System message in every FT example in the dataset… Does this mean once the model is FT’d that that portion of the system message is no longer needed, as it’s essentially baked in? On the flip side, if it is needed, then can you append to the System message to include more directions that weren’t necessarily the focus of the FT job and still reap the enhancements from the FT’d model?

Otherwise, it would suggest that you must always use the exact same System message in production as was used in the examples.

1 Like