Custom Moderation GPT Model | Fine Tuning

Hi community, I have some problems with my model; I used GPT-4 for do a health model with RAG; I require that my model doesn’t speak about: financial, techonology… I want my model only can speak about health topics.

I used Fine-tuning for this issue, but my model got overfitting in some cases, for example when I wrote “Hi, how ar you” their answer was “I can’t speak about that…”, when I passed some examples in the traning data some examples that in which model respond with “Hi, my name in CemGPT…”.

How could I solve this problem?

help me pls!

If you want the model to only talk about specific topics, it is more effective to use a model like GPT-4 Turbo, which is more steerable, rather than fine-tuning.

You can give instructions such as, “If the conversation strays to topics other than health, please respond with, ‘Do you have any questions or topics related to health?’”
GPT-4o tends to be less compliant with instructions, so I recommend using the turbo model.

1 Like

While I fully support @dignity_for_all’s suggestion, I also recalled a similar question from a user some time ago. At the time, I shared a few ideas for the training data composition. I did not hear back from the user in question, so can’t confirm if the approach ended up being successful. For what it’s worth, I am sharing the link to the post anyway:

1 Like

To me, topics like this fall under the big umbrella of “don’t fight the model.”

The model “wants” to be helpful, so trying to give it a bunch of directives to not be helpful only really downgrades the response.

The solution I always propose for this is to filter-in and filter-out.

Just send them user’s message to a super cheap model and ask if it’s on topic, and pass the response through asking if if it’s on topic.

The out-pass wrecks streaming, but you can choose to only do it for questionable outputs or when you’ve already determined a particular user is trying to make the model talk about things you don’t want it to.