One of the reasons I don’t like the term hallucination is because it is used as an excuse by the experts in the field to not explain or understand the results. The same thing happened when AI scientists realized they could skip the whole “scientist” part and just say “Oh it’s a black box and no one knows how to use it”. (I reread this and it kind of sounds like I might be directing that statement at you, I am not.)
So, to shed some light on the situation, I think it is important to understand two things. First, GPT is just text predictor. Surely, more complicated than the next word predictor in your phone, but it essentially does the same thing. The second part is “what even is a hallucination”? Since GPT is just predicting the next response, then you can assume that hallucination has something to do with not having enough information to come up with the correct next words.
In image classification, this would just be a wrong answer. But, because AI researchers like to pretend their inventions are more human and more complex than they actually are, they have called it “hallucinations”. It obfuscates the real problem, but they do it for whatever reason.
In image classification, the solution is pretty simple. You just need more data of the same type. More cats in a certain pose, more cats of a certain color, more cats in certain lighting. In your case, you just need more questions of the same nature, more answers of the same nature.
Luckily, you can just get GPT to generate a bunch of similar questions and answers. What you are trying to do is put enough of your questions and answers in, that the probability of other answers is less likely to show up when focusing on that line of questioning. That is pretty much it.
So, you’ll just have to come up with a bunch of different remixes of your dataset. It is called data augmentation, and in image classification, they’ll do simple things like add noise, skew it, saturate, desaturate, invert color, etc. So you can do the same thing with your answers. Maybe misspellings, correct spellings, different style of question and answers, etc.
Good luck.