Hi community, I have some problems with my model; I used GPT-4 for do a health model with RAG; I require that my model doesn’t speak about: financial, techonology… I want my model only can speak about health topics.
I used Fine-tuning for this issue, but my model got overfitting in some cases, for example when I wrote “Hi, how ar you” their answer was “I can’t speak about that…”, when I passed some examples in the traning data some examples that in which model respond with “Hi, my name in CemGPT…”.
If you want the model to only talk about specific topics, it is more effective to use a model like GPT-4 Turbo, which is more steerable, rather than fine-tuning.
You can give instructions such as, “If the conversation strays to topics other than health, please respond with, ‘Do you have any questions or topics related to health?’”
GPT-4o tends to be less compliant with instructions, so I recommend using the turbo model.
While I fully support @dignity_for_all’s suggestion, I also recalled a similar question from a user some time ago. At the time, I shared a few ideas for the training data composition. I did not hear back from the user in question, so can’t confirm if the approach ended up being successful. For what it’s worth, I am sharing the link to the post anyway:
To me, topics like this fall under the big umbrella of “don’t fight the model.”
The model “wants” to be helpful, so trying to give it a bunch of directives to not be helpful only really downgrades the response.
The solution I always propose for this is to filter-in and filter-out.
Just send them user’s message to a super cheap model and ask if it’s on topic, and pass the response through asking if if it’s on topic.
The out-pass wrecks streaming, but you can choose to only do it for questionable outputs or when you’ve already determined a particular user is trying to make the model talk about things you don’t want it to.