Why is my fine-tuned model hallucinating?

Hello there,
Just last week I asked a question about fine-tuning. I successfully fined-tuned and I am getting results as expected (thank you) but I also observed that some answers are not exactly from the context I have given. The model is making up answers :face_with_peeking_eye:
For example,
Question: What type of customer calls are stressful?
Answer: Angry irate customers with no cell service
Model response: Angry customers who want to cancel their service
Expected response: Angry irate customers with no cell service

This is how I trained

{"messages": [{"role": "system", "content": "You are MegaMaster, and only serve to discover and output phrases in conversational feedback that offer suggestions or improvement. Ensure to maintain the original grammar and spelling, without making any changes. The responses should precisely match the provided context, without any alterations"}, 
{"role": "user", "content": 'Question: What would help you have the best work experience at Acme? Answer: Thanks, Dan. Perhaps more flexibility'}, 
{"role": "assistant", "content": "more flexibility"}]}

Is there anything that I can do to stop the model from making up answers?
I trained around 1000 examples.

One of the reasons I don’t like the term hallucination is because it is used as an excuse by the experts in the field to not explain or understand the results. The same thing happened when AI scientists realized they could skip the whole “scientist” part and just say “Oh it’s a black box and no one knows how to use it”. (I reread this and it kind of sounds like I might be directing that statement at you, I am not.)

So, to shed some light on the situation, I think it is important to understand two things. First, GPT is just text predictor. Surely, more complicated than the next word predictor in your phone, but it essentially does the same thing. The second part is “what even is a hallucination”? Since GPT is just predicting the next response, then you can assume that hallucination has something to do with not having enough information to come up with the correct next words.

In image classification, this would just be a wrong answer. But, because AI researchers like to pretend their inventions are more human and more complex than they actually are, they have called it “hallucinations”. It obfuscates the real problem, but they do it for whatever reason.

In image classification, the solution is pretty simple. You just need more data of the same type. More cats in a certain pose, more cats of a certain color, more cats in certain lighting. In your case, you just need more questions of the same nature, more answers of the same nature.

Luckily, you can just get GPT to generate a bunch of similar questions and answers. What you are trying to do is put enough of your questions and answers in, that the probability of other answers is less likely to show up when focusing on that line of questioning. That is pretty much it.

So, you’ll just have to come up with a bunch of different remixes of your dataset. It is called data augmentation, and in image classification, they’ll do simple things like add noise, skew it, saturate, desaturate, invert color, etc. So you can do the same thing with your answers. Maybe misspellings, correct spellings, different style of question and answers, etc.

Good luck.


We’re doing this, focusing on RAG debugging at WhyHow.AI. We plug into any existing LLM/RAG system you have and focus on creating a knowledge graph based on the feedback that your tester/PM has on an output.

We’ve reduced hallucinations by 80%, debugged errors in seconds with only natural language, and reduced the time to send systems into production. Happy to hop on a call and see where we could be helpful!