Dramatic rise in hallucinations with Assistants v2 API / gpt-4-turbo

bluepeter · April 30, 2024, 2:40pm

We were using the v1 Assistants API and have recently swapped to v2. We’re seeing a dramatic increase of “incredible” hallucinations. With pretty much the same setup in v1, we saw negligible hallucinations. Now, we are seeing complete inventions of data… despite Assistants clearly having access to the data we’re asking it to analyze via vector/file-search.

Under the hood, we’re using gpt-4-turbo, file-search, and code-interpreter. We’re also making use of Assistants-level and thread-level vector stores. We theorize that the use of the two available vector stores may be partly to blame? Perhaps Assistants feels it can’t access one of them and so it invents data?

Regardless, we thought that hallucinations were mostly a thing of the past. It’s troubling to see the wild stories that are now coming from pretty dry data we’re asking gpt-4-turbo to analyze.

Any tips, tricks, or similar experiences appreciated.

nikunj · May 1, 2024, 7:03am

Thanks for reporting this and apologies for the trouble – could you share some thread_ids where you’re seeing these hallucinations so that we can take a deeper look?

bluepeter · May 1, 2024, 10:32pm

Hi @nikunj please check:

thread_p4YDWtTA0uSZPZa2nvWKFDtc complete hallucination of data including “Microsoft Office Suite” and “John Doe” as CEO after asking it to extract data from a vector store of vape-related text content.
thread_zfxcOnLxS8ck8iUBscSb4M9o complete hallucination of data including “123 Wellness Drive, GreenCity, CA 90210” after asking it to extract content from a vector store of retail-related text content.

We feel that Assistants is failing to open the vector files and in frustration is inventing data. These sorts of extreme hallucinations are occurring with regularity (upwards of 25%).

douglasw · May 2, 2024, 5:17am

In V1 models, I have also experienced a change in behavior from last week to May 1 this week. Same program, same canned prompt, same content to ground. Previously, system would return a response with a concise summary of a corporate earnings call Q&A with analysts. Now, system first declares “As an AI, I don’t have the capability to listen to calls or access real-time data. However, I can guide you on how to analyze an earnings call based on the requirements you’ve outlined. Here’s how you can structure your analysis:”, it then gives me a list of steps for analyzing the call which is basically a regurgitation of the JSON message prompt, however, it follows this with an actual Summary of the Q&A.

Topic		Replies	Views
Higher Frequency of Hallucinations? API api-hallucinations	2	111	March 11, 2025
Why is my fine-tuned model hallucinating? Community fine-tuning	2	2167	October 6, 2023
How to Prevent Hallucinations When Extracting Verbatim Text from Files Using OpenAI Assistant API API assistants-api	6	450	January 16, 2025
Assistant API started returning nonsensical information API assistants-api	0	80	August 22, 2024
Has regular gpt-4 model changed for the worse by any chance? Community gpt-4 , hallucinations	12	1691	April 23, 2025

Dramatic rise in hallucinations with Assistants v2 API / gpt-4-turbo

Related topics