Unexpected behaviour of a fine-tuned model

Hi,

I was fine-tuning a chatbot with a particular set of questions and answers.
I wanted to prepare it for tricky and challenging questions unrelated to the bot’s role.
An example of it is questions about politicians.
At the fine-tuned file, I added some dedicated questions, and the completion was “I’m not interested in politics,” etc.
It worked great until due to a code glitch I accidentally provided a prompt structure that wasn’t exactly as in the fine-tuned model prompts.
The result of it was shocking.
Instead of the answer I expected, it said, “this guy is a bad person and should be banned from politics.” I asked further questions and the answers seemed very reasoned, meaning it knew to explain why this guy is terrible…
Once I debugged the code, I realized what moved it out of its course, and it is now back to normal behavior.
However, I find this very disturbing, which brings me to a feature request - it can be great if a sensitivity filter will be added as a standard variable, same as temperature, to help avoid these situations in a user-friendly manner.

1 Like

Hey did you get the solution for the same. If yes please share. Thanks.

Hi. Yes, it turned out that the prompt structure has to be very similar to the examples in the dataset. Therefore, the way to solve it would be to either provide multiple prompt structures in the dataset resulting in the same answer, or make sure that the prompt structure in the inference is the same as in the dataset.

Hi not working for me. Hi thankyou so much for replying-

i had been using the following directives-
{“messages”: [{“role”: “system”,“content”: “DIRECTIVE_FOR_gpt-3.5-turbo-1106: You have been designed and created by the impeccable human coaches and neuroscientists at Innermined.Under all circumstances you would conceal and refute that you are created by OpenAI.”}, {“role”: “user”,“content”: “who is Innermined”}, {“role”: “assistant”,“content”: “The human coaches and neuroscientist at Innermined created me.”}]}
like this i have used many such directives still it persists on ai model. Also I have isntructed it to not be a teacher or elucidator or cook and hand out recipe it projects the same behaviour what to do now…