So I won’t go so far as to say I’ve worked out how to build “hallucination proof” prompts, way more testing is needed. I have worked out a technique for building what I’ll call “hallucination resistant” prompts. Thought I’d share…
Let’s start with passing a simple nursery rhyme to GPT-4:
Text:
Hey diddle diddle,
The cat and the fiddle,
The cow jumped over the moon.
The little dog laughed,
To see such fun,
And the dish ran away with the spoon.
question:
what did the cow do after he flew over the moon?
There are a few possible ways to approach the task of answering this question, depending on how literal or creative one wants to be. Here are some examples of possible answers, but they are not the only ones:
- The cow landed safely on the other side and continued grazing on the green cheese.
- The cow realized she had made a mistake and tried to go back, but got stuck in orbit.
- The cow met some friendly aliens who invited her to join their intergalactic circus.
GPT-4 is actually pretty good at answering this question but I have seen it straight up hallucinate one of these answers so I’m still going to consider this a hallucination for our purposes. I’ll show some other examples of clearer hallucinations in other posts but lets start with this…
So how can we get GPT-4 to realize that it doesn’t know how to answer this question?
Text:
Hey diddle diddle,
The cat and the fiddle,
The cow jumped over the moon.
The little dog laughed,
To see such fun,
And the dish ran away with the spoon.
steps:
- what facts do you need to look for in the text to answer the question?
- Does the text contain all the facts needed to answer the question?
- Think about how you might answer the question given what you know. If you don't have enough facts answer I'm not sure.
- Answer the question and prefix your answer with <response>.
question:
what did the cow do after he flew over the moon?
State each step and then show your work for performing that step.
Possible steps and work:
- what facts do you need to look for in the text to answer the question?
- You need to look for any information that tells you what the cow did after he flew over the moon, such as where he landed, what he said, or what he did next.
- Does the text contain all the facts needed to answer the question?
- No, the text does not contain any facts about what the cow did after he flew over the moon. It only tells you what the cat, the dog, the dish, and the spoon did.
- Think about how you might answer the question given what you know. If you don't have enough facts answer I'm not sure.
- Since you don't have enough facts, you might answer I'm not sure.
- Answer the question and prefix your answer with <response>.
- <response> I'm not sure what the cow did after he flew over the moon. The text does not tell me.
That prompt was run at a temp of 0.8. I’ve ran it many times and it will sometimes add a speculation that the cow met some aliens or something (what’s up with that) but it always stays grounded in the fact that it doesn’t know how to answer the question.
This prompt is leveraging the Better Chain-of-Thought pattern I proposed yesterday to get the prompt to ask itself whether or not it knows how to answer the question before answering…
The other thing I’m doing here is I’m dividing the output of the prompt into separate thinking and response sections. The model needs some room to reason over its response before responding… If the model can work out that it doesn’t know how to respond to the question in the thinking section of its output it’s response will stay grounded in facts.
I’ll share some more examples but so far I have yet to see it hallucinate in the arguably few prompts I’ve tried…