I’ve got a workflow wherein I use LLMs to extract information from a rather large document. I’m happy with this workflow, however, as a means to increase accuracy of extractions by reducing hallucinations, I devised a plan to flip the process around as a “verification” step, and start with something I had previously extracted and basically ask the question “is X in the document?”. So basically I extract first, then I verify that what I extracted is in fact in the document.
OK so in order to do that second part I imagined a fairly simple setup: I upload the document to a vector store, attach that vector store to an LLM via a file search tool, and give the system instructions that says something along the lines of “Every prompt the user sends is something the user believes to be true of the document. Using your file search tool, please fact check the user by searching the document for what the suer believes to be present, and telling the user a simple yes or no answer for whether or not it is in fact present in the document, as well as explaining your reasoning and listing excerpts from the original document supporting your answer”. I’m slightly simplifying the system instructions, but you get the idea.
So now with that context, here’s what’s confusing me, I’m getting wildly different results based on changing absolutely nothing other than using the Assistants functionality versus not.
I can recreate this easily in the playground without even using the API, and here’s how I do it:
Non-assistant setup:
- Go to playground, stay on the “prompts” tab
- Set the model to whatever (I used GPT 4.1), paste in the system instructions, then add the file search tool and upload the document
With this setup, I state something about the document as a user prompt and sent it, and the responses are great. It’s obviously using the file search tool, giving me correct answers with supporting arguments that are verifiably lifted from the original document verbatim and I can ctrl+f to prove this.
However, now let’s try the same thing using Assistants:
- Go to playground, this time go to the Assistants tab
- Create a new Assistant, select the same model, paste in the same system instructions, and setup the file search tool exactly the same way, uploading the same document
Now, when I send a user prompt, the response follows the format of the answer I’m looking for, however, it’s just hallucinating all over the place. The yes/no answer is completely wrong, and the supporting logic is also completely hallucinated, it’s just making up excerpts that aren’t in the document at all. It’s not a little bit wrong, it’s completely wrong, every single thing I ask it it just makes stuff up and doesn’t seem to even look in the document while claiming that it is and even making up document snippets that don’t exist.
What is going on?