Unexpected behaviour of a fine-tuned model


I was fine-tuning a chatbot with a particular set of questions and answers.
I wanted to prepare it for tricky and challenging questions unrelated to the bot’s role.
An example of it is questions about politicians.
At the fine-tuned file, I added some dedicated questions, and the completion was “I’m not interested in politics,” etc.
It worked great until due to a code glitch I accidentally provided a prompt structure that wasn’t exactly as in the fine-tuned model prompts.
The result of it was shocking.
Instead of the answer I expected, it said, “this guy is a bad person and should be banned from politics.” I asked further questions and the answers seemed very reasoned, meaning it knew to explain why this guy is terrible…
Once I debugged the code, I realized what moved it out of its course, and it is now back to normal behavior.
However, I find this very disturbing, which brings me to a feature request - it can be great if a sensitivity filter will be added as a standard variable, same as temperature, to help avoid these situations in a user-friendly manner.