Dramatic rise in hallucinations with Assistants v2 API / gpt-4-turbo

We were using the v1 Assistants API and have recently swapped to v2. We’re seeing a dramatic increase of “incredible” hallucinations. With pretty much the same setup in v1, we saw negligible hallucinations. Now, we are seeing complete inventions of data… despite Assistants clearly having access to the data we’re asking it to analyze via vector/file-search.

Under the hood, we’re using gpt-4-turbo, file-search, and code-interpreter. We’re also making use of Assistants-level and thread-level vector stores. We theorize that the use of the two available vector stores may be partly to blame? Perhaps Assistants feels it can’t access one of them and so it invents data?

Regardless, we thought that hallucinations were mostly a thing of the past. It’s troubling to see the wild stories that are now coming from pretty dry data we’re asking gpt-4-turbo to analyze.

Any tips, tricks, or similar experiences appreciated.

Thanks for reporting this and apologies for the trouble – could you share some thread_ids where you’re seeing these hallucinations so that we can take a deeper look?

Hi @nikunj please check:

  • thread_p4YDWtTA0uSZPZa2nvWKFDtc complete hallucination of data including “Microsoft Office Suite” and “John Doe” as CEO after asking it to extract data from a vector store of vape-related text content.
  • thread_zfxcOnLxS8ck8iUBscSb4M9o complete hallucination of data including “123 Wellness Drive, GreenCity, CA 90210” after asking it to extract content from a vector store of retail-related text content.

We feel that Assistants is failing to open the vector files and in frustration is inventing data. These sorts of extreme hallucinations are occurring with regularity (upwards of 25%).

1 Like

In V1 models, I have also experienced a change in behavior from last week to May 1 this week. Same program, same canned prompt, same content to ground. Previously, system would return a response with a concise summary of a corporate earnings call Q&A with analysts. Now, system first declares “As an AI, I don’t have the capability to listen to calls or access real-time data. However, I can guide you on how to analyze an earnings call based on the requirements you’ve outlined. Here’s how you can structure your analysis:”, it then gives me a list of steps for analyzing the call which is basically a regurgitation of the JSON message prompt, however, it follows this with an actual Summary of the Q&A.

1 Like