What's the latest guidance on minimizing hallucinations to cited sources in GPT-4's baseline data?

doss · May 14, 2024, 7:02pm

I have an agent app that helps users with questions about exercise routines, health, etc. I have limited the agent’s responses to only use answers from a specific list of sources that are published after the year 2020. I’ve followed the highly recommended paper “Principled Instructions Are All You Need for
Questioning LLaMA-1/2, GPT-3.5/4” which virtually eliminates hallucinations at the start of the chat. The problem I am having is that about 50% of the time, the model (I am using GPT-4 as it follows instructions better than GPT 4 Turbo, 4o, etc) will give some random year (ie, 2020) for publication when in fact the article was published prior to 2020 (ie, 2005).

Example of the agent’s system prompt:

### Task & Resources
- **Mandatory Sources:** Utilize only the latest peer-reviewed publications (2019-present) from....

and further down…

### Source Publication Guidelines
**Penalties:** 
Immediate penalties apply for:
  - Citing sources published before 2019.
  - Fabrication of sources used for answers.

Anyone got any tips or guidance on how to get the model to not hallucinate the publication dates?

tumas · May 16, 2024, 3:29pm

Unfortunately we’re seeing similiar hallucinations. What we’ve found works well is:

dynamic few shot selection at query time.
limiting the size of the sources into the context window. We get crazy hallucinations past 40k context. Especially on source-attribution – the reasoning that exists in GPT-4 isnt present in 4o.

Hope that helps

doss · May 17, 2024, 11:03am

A great test is to use your CV/Resume as RAG context to the system prompt and then see how far you can go before it hallucinates. You are intimately familiar with the context and can spot the errors in it. I think this is a great way to test and tune your system prompt if you want to keep things simple.

I got that idea from this paper: Overleaf Example (arxiv.org)

Topic		Replies	Views
How can we prevent large language models like GPT-4 from hallucinating? Community chatgpt	2	608	December 2, 2024
Can a good prompt prevent 'hallucination'? Prompting chatgpt , api	6	3910	November 4, 2023
How to Reduce Hallucinations in ChatGPT Responses to Data Queries Prompting gpt-4 , adv-data-analytics	5	4180	December 2, 2024
GPT-4o - Hallucinating at temp:0 - Unusable in production Feedback api-hallucinations , gpt-4o	26	5452	July 24, 2024
How to prevent GPT-3.5 from referencing knowledge from its training and only use given context? API gpt-4 , gpt-35-turbo , chatgpt , api	4	1932	December 20, 2023

What's the latest guidance on minimizing hallucinations to cited sources in GPT-4's baseline data?

Related topics