Why is system message on untrained model better than on model trained with examples

I trained a model for using it as machine translation service. The system message looks like

You are a machine translation engine and translate single texts or multiple segments from English / ‘en’ as source language to German / ‘de’ as target language. You are specialized in website texts for language service providers. Your tone is colloquial and friendly, and your translations are catchy and easy to understand. You always use the german, informal “du” when translating the english “you” addressing others.

This message works pretty well on an untrained model to generate the desired capabilities, e.g. regarding the “you”.

When I use the system-message to train the model with 10 simple examples like e.g.

You may choose to review a translation yourself, using the review solution in our online Language Portal.

Vielleicht möchtest du eine Übersetzung mithilfe der Review-Lösung in unserem online Sprach-Portal selbst überprüfen.

The Model e.g. never translates “you” to “du” anymore.
That’s a rather strange result to me ?

Hi and welcome to the Developer Forum!

Are you including the system prompt in your training data? Fine tuning will change the way the model responds to all input and try to match that with the output you provide, so if you don’t include the system prompt in your training data and then use on in your API calls, you may experience unpredictable results.

Thanks for the answer! Yes, I include the system prompt to the trainings-data with every example.

That’s why I don’t understand the effect: untrained model requested with system prompt → desired result, Fine-tune model with system-prompt & 10 examples, then request model with system-prompt → far worse result

10 training examples is the very bare minimum to have an effect, to get useful output you will require thousands, and at least several hundred to start to get an reasonable effect.

Hmm, that sounds discouraging. My goal is, to train a model to act as a specialized Machine Translation engine. Would you fine-tune the model or simply send a properly crafted system-message with every request to translate ? Thank you !

I would try to fine-tune davinci-002 if you have those thousands of quality translation training tasks. What you start with isn’t much, and naturally poorer that GPT-3 was.

Prompting to receive completion

Here is the original text in German.

Spätzle ist die schwäbische Verkleinerungsform von Spatz und bedeutet möglicherweise „Sperling“ oder „Batzen, Klumpen“. Spätzle ist in der hier verwendeten Bedeutung ein Pluralwort.[2] Als Wasserspatzen (der Teig konnte in Notzeiten auch ohne Eier nur aus Mehl, Wasser und Salz bereitet werden) ist das Gericht im 18. Jahrhundert belegt. Im Raum Hohenlohe werden sie mit der fränkischen Verkleinerungssilbe -lich Spätzlich genannt,[3] in der Schweiz und im Markgräflerland Spätzli oder Chnöpfli und im niederalemannischen Raum Knepfli. In Ungarn (nokedli und auch galuska – wobei dort beide Bezeichnungen eingebürgerte Fremdwörter sind) und der Slowakei ist ein äquivalentes Gericht verbreitet (galuska bzw. Bryndzové halušky). Im nördlichen Österreich werden Spätzle als Nockerln[4] bezeichnet (z. B. in der Zubereitungsform Eiernockerl); in Kärnten und Tirol werden auch Teigtaschen als Nocken bezeichnet.

Der Name bezieht sich wahrscheinlich auf die Form der Spätzle im 18. Jahrhundert, die mit Spatzen verglichen wurde.[5] Manche Sprachwissenschaftler lehnen den Namen auch an das Wort Batzen für (Teig-)Klumpen an.[6]

Je nach Form wird in einigen Regionen zwischen Spätzle (die Länge übersteigt den Durchmesser um mehr als das Vierfache) und Knöpfle (das Verhältnis von Länge zu Durchmesser liegt unter zwei) unterschieden. Missratene, großklumpige oder zusammenklebende Spätzle werden auch Raben, Störchen, Rappen, Nachtigallen, Großvat(t)er oder Adler genannt.

Here is that text, translated into English by an expert AI system.

Spätzle is the diminutive form of spatz in the Swabian dialect and means “sparrow” or “clump”. Spätzle is used in the sense in which it is used here as a plural word.[2] As water sparrows (the dough could also be prepared without eggs in times of need from flour, water and salt), the dish is documented in the 18th century. In the Hohenlohe region, they are called Spätzlich Spätzle in the Franconian diminutive -lich, in Switzerland and in the Markgräflerland Spätzli or Chnöpfli and in the northern Alemannic region Knepfli. In Hungary (nokedli and also galuska - both of which are established foreign words) and Slovakia an equivalent dish is widespread (galuska or Bryndzové halušky). In northern Austria, Spätzle are called Nockerln[4] (e.g. in the preparation form Eiernockerl); in Carinthia and Tyrol, dumplings are also called Nockerl.

The name probably refers to the shape of the Spätzle in the 18th century, which was compared to sparrows.[5] Some linguists reject the name because it is also related to the word Batzen for (dough) clump.[6]

Depending on the shape, in some regions, between Spätzle (the length exceeds the diameter by more than four times) and Knöpfle (the ratio of length to diameter is less than two) is distinguished. Failed, large clumpy or sticky Spätzle are also called ravens, storks, raps, nightingales, grandfathers or eagles.

I love the fact that the AI was able to translate “Batzen” into “clump” and “Knöpfle” into “dumpling”. I’m not sure if it’s because of the similarity in sound or because of the similarity in meaning. The AI also translated “Klumpen” into “clump” and “Klumpen” into “clump”. The AI was able to translate “Raben” into “raven” and “Störchen” into “stork”. The AI was also able to translate “Raben” into “raven” and “Störchen” into “stork”. The AI was also able to translate “Raben” into "r…

You can also train the AI to output a stop sequence when it is done translating instead of the “completion” continuing past the end and wandering into speculation.

thanks for sharing this, I still don’t understand, should I include system prompt in training dataset?