Responses API file_search tool - issues and guidance

elliotfraser · April 3, 2025, 3:25pm

Hey everyone, I’m working with the new responses API and trying out the out-of-the-box file_search tool. Here’s what I’ve done so far:

Set up a vector store and added the file_search tool schema
File search is working, I can see that it’s retrieving relevant knowledge correctly
Overall, technically things are functional

Where I’m running into trouble is in how the AI is using (or rather, overusing) the tool in practice.

In my use case, users aren’t uploading any files themselves. There’s a background job that manages file ingestion and indexing centrally, kind of like a shared knowledge base. But when the AI responds, it keeps referencing “files uploaded by the user,” which is both confusing and inaccurate for my setup.

Also, I only want the AI to use the file_search tool in specific scenarios (which I’ve outlined in the system prompt). But what I’m seeing is it reaching for the tool way too often, sometimes even proactively saying things like “It looks like some files have been uploaded. Would you like me to look for something specific in them?” — which doesn’t make sense in my context and makes the experience feel off.

I tried guiding the behavior via the system prompt with stuff like:

“The file_search tool allows you to search and retrieve relevant existing knowledge that has been submitted by others in the organization. These are not uploaded by the user, they are maintained centrally and provided automatically for reference and inspiration. Only use the file_search tool if the user explicitly asks to explore…”

…but I’ve had little success. It seems like either there’s some internal system prompt baked in when the file_search tool is enabled, or the model has been fine-tuned to assume a certain behavior that’s overriding my instructions.

Curious if anyone else has run into similar issues or has tips for better controlling tool use and messaging? Would love to hear how others are handling this.

kduffie · April 4, 2025, 12:23am

In our use case, we typically WANT the model to use the file_search on virtually every generation, so we haven’t run into this problem. We did, however, run into the opposite problem – that sometimes it would not use the file_search tool when we wanted it to. We solved that problem with a very simple instruction to “Use the file_search tool” (more or less). My reaction to the instructions that you shared is that it is too complicated. First, ask yourself how the model should know when to use the tool, then ask it to do exactly that. Suppose you just said, “Use the file_search tool if the user is asking to…” – and figure out how to finish that simple sentence.

Again, I know how much time we spent on instructions before we got them right, so I know it is not easy. But our experience is that short definitive instructions are the path to success.

_j · April 4, 2025, 12:33am

This exact behavior, that has caused a considerable flood of incoming complaints, is OpenAI’s doing.

BUG: This is a major issue, the behavior of which should be reverted.

They are persistently injecting a new system message before any user input about files. It damages the application.

In addition the tool language is extensive bloat, also telling the AI “a user uploaded files”.

morgenweck · April 4, 2025, 3:28pm

IMHO, Instructions are key. Did you find that saying Use the file_search tool if the user is asking to… or something like “Only use the file_search_tool. Do not use any other source” works? I work for a medical research hospital and if any researcher loaded their PDF’s or a group of PubMed Articles but got an answer that was external and not what they were thinking then the app would never be used. I need privacy on the personal uploaded documents and 100% accuracy to the public files inputted.

kduffie · April 5, 2025, 6:31pm

“100% accuracy”. Good luck with that. The nature of AI is to generalize. If you can’t afford mistakes, you may be using the wrong technology. Even humans are going to make mistakes trying to interpret documents.

Dimitrius · June 25, 2025, 10:08am

Any news on how to control when to use/not use the file_search?

kduffie · July 1, 2025, 6:54pm

Using the responses API, you can explicitly say whether you do or do not want the file_search tool available. But if you make it available, you need to use instructions to help the AI decide whether to use it. In our experience, getting instructions that get it to correctly use it or not use it in a given circumstance is tricky at best.

mohanlalranvir · July 2, 2025, 10:29am

I also faced this issue on a similar setup - solved with prompting.

Upfront I have this instruction (as part of my top level instruction paragraph):

 The documents you access have been pre-uploaded and are not uploaded by the user you are interacting with.

Then I have further instructions:

3. **Response Based on FileSearch Information:**
   - Only use the information retrieved from FileSearch to generate responses.
   - Avoid using any prior knowledge or external information outside of what is provided through FileSearch.
   - Refrain from mentioning documents or the fact that information was retrieved from documents in your responses.

mohanlalranvir · July 2, 2025, 10:33am

For this I’d suggest using the Agents SDK - have a top level agent who’s task is to determine if the query should be routed to a ‘general’ agent to answer general questions or if it should be routed to an agent which has the file search capability.

Topic		Replies	Views
Assistant referring to "the files uploaded" in the vector store Prompting assistants-api	6	449	April 4, 2025
Avoid explicit mention to retrieval and assistant files Prompting rag , assistants-api , knowledge-files	8	2299	April 10, 2025
Is file search result prioritized over instruction? API threads , assistants-api , file-search	3	92	April 3, 2025
AI Referencing Nonexistent File Uploads After Migrating from Assistants to Responses API Bugs gpt-4	1	121	April 15, 2025
Assistant is sporadically failing to utilize File Search API assistants-api	3	559	October 11, 2024

Responses API file_search tool - issues and guidance

BUG: This is a major issue, the behavior of which should be reverted.

Related topics