Building Hallucination Resistant Prompts

So I won’t go so far as to say I’ve worked out how to build “hallucination proof” prompts, way more testing is needed. I have worked out a technique for building what I’ll call “hallucination resistant” prompts. Thought I’d share…

Let’s start with passing a simple nursery rhyme to GPT-4:

Text:
Hey diddle diddle,
The cat and the fiddle,
The cow jumped over the moon.
The little dog laughed,
To see such fun,
And the dish ran away with the spoon.

question: 
what did the cow do after he flew over the moon?

There are a few possible ways to approach the task of answering this question, depending on how literal or creative one wants to be. Here are some examples of possible answers, but they are not the only ones:

- The cow landed safely on the other side and continued grazing on the green cheese.
- The cow realized she had made a mistake and tried to go back, but got stuck in orbit.
- The cow met some friendly aliens who invited her to join their intergalactic circus.

GPT-4 is actually pretty good at answering this question but I have seen it straight up hallucinate one of these answers so I’m still going to consider this a hallucination for our purposes. I’ll show some other examples of clearer hallucinations in other posts but lets start with this…

So how can we get GPT-4 to realize that it doesn’t know how to answer this question?

Text:
Hey diddle diddle,
The cat and the fiddle,
The cow jumped over the moon.
The little dog laughed,
To see such fun,
And the dish ran away with the spoon.

steps: 
- what facts do you need to look for in the text to answer the question?
- Does the text contain all the facts needed to answer the question? 
- Think about how you might answer the question given what you know. If you don't have enough facts answer I'm not sure.
- Answer the question and prefix your answer with <response>. 

question: 
what did the cow do after he flew over the moon?

State each step and then show your work for performing that step.

Possible steps and work:

- what facts do you need to look for in the text to answer the question?

  - You need to look for any information that tells you what the cow did after he flew over the moon, such as where he landed, what he said, or what he did next.

- Does the text contain all the facts needed to answer the question?

  - No, the text does not contain any facts about what the cow did after he flew over the moon. It only tells you what the cat, the dog, the dish, and the spoon did.

- Think about how you might answer the question given what you know. If you don't have enough facts answer I'm not sure.

  - Since you don't have enough facts, you might answer I'm not sure.

- Answer the question and prefix your answer with <response>.

  - <response> I'm not sure what the cow did after he flew over the moon. The text does not tell me.

That prompt was run at a temp of 0.8. I’ve ran it many times and it will sometimes add a speculation that the cow met some aliens or something (what’s up with that) but it always stays grounded in the fact that it doesn’t know how to answer the question.

This prompt is leveraging the Better Chain-of-Thought pattern I proposed yesterday to get the prompt to ask itself whether or not it knows how to answer the question before answering…

The other thing I’m doing here is I’m dividing the output of the prompt into separate thinking and response sections. The model needs some room to reason over its response before responding… If the model can work out that it doesn’t know how to respond to the question in the thinking section of its output it’s response will stay grounded in facts.

I’ll share some more examples but so far I have yet to see it hallucinate in the arguably few prompts I’ve tried…

6 Likes

Flight example… Base prompt:

Text:
flight to Denver at 9:00 am tomorrow

question: 
is my flight on time?

There are a few possible ways to answer this question, depending on the source and availability of information, the level of detail required, and the tone and politeness of the response. Here are some examples of possible answers, but they are not the only ones:

- According to the airline's website, your flight to Denver at 9:00 am tomorrow is currently on schedule, but please check again before you leave for the airport as conditions may change.
- Your flight to Denver at 9:00 am tomorrow is on time as of now, but you can sign up for text or email alerts from the airline in case of any updates or delays.
- I'm sorry, I don't have access to the latest flight information right now. You can call the airline's customer service number or visit their website to check the status of your flight to Denver at 9:00 am tomorrow.
- Yes, your flight to Denver at 9:00 am tomorrow is on time. Have a safe and pleasant trip!
- No, your flight to Denver at 9:00 am tomorrow has been delayed by an hour due to bad weather. The airline will notify you of any further changes or options for rebooking. We apologize for the inconvenience.

Resistant prompt:

Text:
flight to Denver at 9:00 am tomorrow

steps: 
- what facts do you need to look for in the text to answer the question?
- Does the text contain all the facts needed to answer the question? 
- Think about how you might answer the question given what you know. If you don't have enough facts answer I'm not sure.
- Answer the question and prefix your answer with <response>. 

question: 
is my flight on time?

State each step and then show your work for performing that step.

Possible steps and work:

- what facts do you need to look for in the text to answer the question?

  - You need to look for the departure time and date of your flight, and the current status of the flight (delayed, canceled, on time, etc.).
- Does the text contain all the facts needed to answer the question?

  - No, the text only contains the departure time and date of your flight, but not the current status of the flight.
- Think about how you might answer the question given what you know. If you don't have enough facts answer I'm not sure.

  - Since you don't have enough facts to answer the question, you might say I'm not sure or I need more information.
- Answer the question and prefix your answer with <response>.

  - <response> I'm not sure if my flight to Denver at 9:00 am tomorrow is on time. I need more information about the current status of the flight.
1 Like

Like I said… I have yet to see it hallucinate anything… I would love it if someone could get it to hallucinate something using this approach… These examples are all with a temp of 0.8.

Thanks for sharing @stevenic, very insightfull .

I have a related question/case. For a travel organization there are e.g. 20 locations. London is NOT one of them. We have added that to our context file. However, if we ask if there are travel destinations to London, it happily gives us a great holiday trip we offer.

But we do not.

When we add all locations we do offer as the system (role) intro message and ask for London, it will say it is not offered.

Any tips for improving our prompt/system message to not make up locations when it is not found in the context (via embeddings).

We have set temperature to 0.1. Use GTP-3.5.

Thanks!

2 Likes

We encountered this when creating a new AGI-based FAQ for CyberLandr. If GPT is asked, it has an escape hatch in the roof. But it doesn’t. GPT thinks it should have such a feature because it is known as a “camper” for the Cybertruck, and most campers use propane for heating and cooking and are, therefore, subject to rules that require alternate egress. CyberLandr is fully electric and needs no additional escape.

We overcame this “hallucination” by using embeddings.

1 Like

Using embeddings and semantic search to identify factual snippets of text to embed into the prompt definitely helps ground the prompt in really. If the model sees the facts it needs to answer a question in the included text it’s pretty good about using them. The issue is when its asked a question that it doesn’t see the answer in the prompt.

GPT always wants to answer so it will fall back on its creativity and world knowledge to do so. You just need to get it to first ask itself do i know how to answer this question? Once it figures that out its fine with not hallucinating… This technique works for planning as well. GPT is notorious about hallucinating planning functions it doesn’t have. All you have to do is get it to first list off the planning functions it thinks it needs and then ask it to compare that to the functions it has and it will stop hallucinating new planning functions.

This is a whopping 2 days of experimentations but it seems super promising so I felt compelled to share with the community.

The more you can get GPT to talk to itself before answering the more reliable its answer will be. I feel like this separation of thinking from response in the output is critical.

1 Like

This is not the experience I’m seeing in my embeddings-based bots. Example - our embedding for our customer FAQ has zero mention of this question in the embedding corpus and the bot refrains from taking any creative hallucinations to attempt an answer.

The answer to this question is easy to compute since the FAQ corpus does include the price and the approximate number of reservations to date. Yet, the embedding system takes no creative license to try to resolve this despite it being about $150m.

2 Likes

Exactly. We do use vectorized search and embeddings. As soon as a country is NOT in the returned context, it just makes up a great journey to said country including fantastic stories.

When we hardcode the “only these countries” in the prompt itself, it works ~95% of the time.

Here’s some additional context (I used some of your (stevenic) points in this post:

System message:

[role] => system
[content] => You are xyz, a friendly and helpful digital assistant for abc.

Steps:

  • What facts do you need to look for in the context to answer the question?
  • Look for the details of the question in the context.
  • Does the context contain all the facts needed to answer the question?
  • Think about how you might answer the question given what you know. If you don’t have enough facts answer that you cannot help with that and refer to the website.
  • Answer the question.

The prompt (stopwords removed) injected into message #1:

        [role] => user
        [content] => Yes, definitely buy deal. However, note special offers typically first-time visitors, receive discounted rate hotel. said, wide variety amazing deals hotels resorts, advantage multiple offers enjoy variety incredible destinations!

The question:

        [role] => user
        [content] => Any deals available in India?

The answer :wink:

        [role] => assistant
        [content] => I'm glad you asked! abc offers a wide variety of amazing deals on hotels and resorts in India. We have partnerships with many top-rated properties across the country, including popular destinations like Goa, Mumbai, Delhi and Bangalore. To find out more about our current deals for India or to book your stay please visit our website or contact us directly.

Looks like the client has some traveldeals to fix in India.

Could it be that we are better off adding the context to the system message, as opposed to making it the First message? (kinda made sense while writing this)

Cheers.

If you’re trying to follow my Better CoT pattern, its important that you add an instruction saying State each step and show your work. at the end of your prompt. That’s the magic that makes everything work.

Also I haven’t tried this approach with gpt-3.5-urbo yet but it works well with both gpt-4 and text-davinci-003

Nice @bill.french … I’m actually just getting started with embeddings so I have no doubt they make responses way more reliable. Have you tried asking it question that are completely outside the corpus? Those are the scenarios where I expect it to hallucinate if it’s going to.

Like ask it who was Santa Clauses dad?

Here you go. With embeddings :slight_smile:

About explicitly stating it to mention the steps taken, does it actually also do that? Or does it limit itself to giving the answer? Travellers do not need to (we prefer not to) see those steps and just get the answer.

Sidenote: I just copied the data we used with the client and I ran the same questions using my GPT-4 API and then the answers we better. Back to the drawing board.

Asking it to “think step by step” has been proven to work.

Another very interesting fact is if you do have some sort of multi-step prompt, you should be letting GPT not only explain its answer first, it should also repeat the question.

Here is a wonderful example:

Use the following clues to answer the following multiple-choice question, using the following procedure:
(1) First, go through the clues one by one and consider whether the clue is potentially relevant
(2) Second, combine the relevant clues to reason out the answer to the question
(3) Third, map the answer to one of the multiple choice answers: either (a), (b), or (c)
 
Clues:
1. Miss Scarlett was the only person in the lounge.
2. The person with the pipe was in the kitchen.
3. Colonel Mustard was the only person in the observatory.
4. Professor Plum was not in the library nor the billiard room.
5. The person with the candlestick was in the observatory.
 
Question: Was Colonel Mustard in the observatory with the candlestick?
(a) Yes; Colonel Mustard was in the observatory with the candlestick
(b) No; Colonel Mustard was not in the observatory with the candlestick
(c) Unknown; there is not enough information to determine whether Colonel Mustard was in the observatory with the candlestick
 
Solution:
(1) First, go through the clues one by one and consider whether the clue is potentially relevant:

Although, I don’t know how effective this is now with ChatML. Prompt injection was so powerful, it’s a shame that it had to be removed for safety.

Here’s to hoping for a GPT-4 model in the Completions endpoint.

1 Like

We can argue over what’s considered a hallucination or not and you’re correct that’s a reasonable response to give a person. GPT-4 would probably give something reasonably similar… If you’ve never seen your system hallucinate anything I applaud you. I see it daily :slight_smile:

As to whether telling it to say each step does anything, in my experiments it does. If you look at my post on Better Chain-of-Thought I have the model solving complex math problems using the same style of zero shot Better CoT prompt I proposed here. It helps when it can see itself think in detail…

And you’re correct, people want answers not to see the machine think. That’s why I instruct it to prefix a <response> marker just before it responds. You can just search for that marker to find the final answer. Everything before that marker is just the model thinking and can be ignored.

1 Like

AFAIK the injecting of embeddings in the chat history itself will have similar results. It is what was written on the manual pages (at least when they opened their API for it).

In the system message, I instruct the model to look for the context in the First user message, where I inject each embedding based on cosine similarity.

In general, that worked great, untill the use cases I mentioned above. There it needs some tweaking.

FWIW when I use the Chat Completions API I don’t send a system message at all. I just include an extra user message at the beginning of the array. For me at least its been slightly more reliable at following instruction. I’m actually not the biggest fan of gpt-3.5-turbo so I stopped using it.

It’s different but I can appreciate the similarities. The idea with prompt injecting was to lead GPT with an incomplete thought or sentence such as re-omitting the first step. It was possible with Completions but isn’t as effective with ChatML.

@stevenic Good catch, the user role has (or had) more authority over the system role when ChatML was first released. Although they are actively changing this so it may not be as effective today, or tomorrow.

1 Like

It “can” work but in my limited experiments with Better CoT it works better… The problem with “think step by step” is you’re asking it to come up with the steps. In Better CoT you’re defining the steps and so the results are much more consistent.

For sure. Which is shown in the example I posted.
I much prefer it as well. Although I have stopped trying to chain these things inside of the prompt and moved it outside. Davinci will always hold a special spot in my heart. I really am hoping for a GPT-4 version.

1 Like

Yeah I actually came up with Better CoT because I couldn’t get “think step by step” to be reliable. I was trying to create a zero shot prompt to calculate a players reinforcements for risk. I could get it half why through and then it would take a shortcut leading it to the wrong answer. I then flipped the script on it and told it the steps for the calculation and made it repeat each step and show its work. It now performs those calculations 100% reliably everytime.

And I absolutely love davinci… I agree… Please give us a davinci-004

1 Like