Hello, Please clarify on the below queries on fine-tuning.
-
In the fine-tuning page of OpenAI, there is an example format to create a chatbot that occasionally gives sarcastic responses. Can we control how often it can respond sarcastically?
-
Can we train an LLM to behave in 2 or more different ways with a single fine-tuning?
-
Can we switch between behaviours(on which the LLM was fine-tuned) during the inference(production) time?
-
If we can fine-tune on 2 different behaviours, how should we present the examples during fine-tuning.
Thanks in advance.
Hi!
I have not implemented this myself but I am fairly convinced that this should be doable via fine-tuning.
During fine-tuning patterns visible in your training data are detected and picked up. So what you need to do is to make it clear on the basis of your training examples as to when the bot should respond with which tone or personality. You can for example achieve that by specifying in your system message the conditions as to when it should adopt which tone / personality, e.g. for which type of user questions or responses. You then further demonstrate this desired response pattern with pairs of user - assistant messages.
What’s important is that there is a clear logic visible in your examples, so the model can discern when to adopt which response style.
Hello, Thanks for the response.
To ensure that proper persona is referred to after the fine-tuning, how good is it to embed a persona identifier inside the fine-tuning example? When the system has to be configured to respond as a particular persona after the model deployment, can we configure the model wrapper so that it appends the persona identifier to the user query so that it responds it in appropriate way as soon as the conversation starts.