Function tool receives random values when file_search is enabled

Hi everyone,

I’m currently working with the OpenAI Assistants API and running into an issue when using multiple tools together.

Here’s my setup:

I’ve defined two tools:

  1. A function tool called talkToHuman:
{
  "type": "function",
  "name": "talkToHuman",
  "description": "If user wants to talk to a human.",
  "parameters": {
    "type": "object",
    "properties": {
      "name": {
        "type": "string",
        "description": "Name of the person"
      },
      "email": {
        "type": "string",
        "description": "Email address of the person"
      }
    },
    "required": ["name", "email"],
    "additionalProperties": false
  },
  "strict": true
}
  1. A file_search tool with a valid vector store ID:
{
  "type": "file_search",
  "vector_store_ids": [
    "vs_xxxxxx"
  ]
}

The issue occurs when I include both tools in the assistant.

What happens:

When I type something like:
“I want to talk to a human”,
the assistant does call the talkToHuman function — but with random values, such as:

{
  "name": "User",
  "email": "user@example.com"
}

These values are not coming from my message, and they’re not correct.

If I remove the file_search tool and keep only the talkToHuman function, the assistant behaves properly and extracts the actual name/email from the conversation context or asks me for it.

My question:

Has anyone else experienced this kind of behavior?
Is there a known issue or a workaround when combining function tools with file_search?

Any help would be greatly appreciated!

Thanks in advance :folded_hands:

1 Like

Your problem likely comes about because of the tiny “function” specification - in comparison to the huge text dump that burdens your context to accompany file search.

Your cute little function is placed for the AI like this:

## functions

// If user wants to talk to a human.
type talkToHuman = (_: {
// Name of the person
name: string,
// Email address of the person
email: string,
}) => any;

But guess what, is only tagged on to this massive internal text dump - for an even smaller argument to write.

# Tools

## file_search

// Tool for browsing the files uploaded by the user. To use this tool, set the recipient of your message as `to=file_search.msearch`.
// Parts of the documents uploaded by users will be automatically included in the conversation. Only use this tool when the relevant parts don't contain the necessary information to fulfill the user's request.
// Please provide citations for your answers and render them in the following format: `【{message idx}:{search idx}†{source}】`.
// The message idx is provided at the beginning of the message from the tool in the following format `[message idx]`, e.g. [3].
// The search index should be extracted from the search results, e.g. #  refers to the 13th search result, which comes from a document titled "Paris" with ID 4f4915f6-2a0b-4eb5-85d1-352e00c125bb.
// For this example, a valid citation would be ` `.
// All 3 parts of the citation are REQUIRED.
namespace file_search {

// Issues multiple queries to a search over the file(s) uploaded by the user and displays the results.
// You can issue up to five queries to the msearch command at a time. However, you should only issue multiple queries when the user's question needs to be decomposed / rewritten to find different facts.
// In other scenarios, prefer providing a single, well-designed query. Avoid short queries that are extremely broad and will return unrelated results.
// One of the queries MUST be the user's original question, stripped of any extraneous details, e.g. instructions or unnecessary context. However, you must fill in relevant context from the rest of the conversation to make the question complete. E.g. "What was their age?" => "What was Kevin's age?" because the preceding conversation makes it clear that the user is talking about Kevin.
// Here are some examples of how to use the msearch command:
// User: What was the GDP of France and Italy in the 1970s? => {"queries": ["What was the GDP of France and Italy in the 1970s?", "france gdp 1970", "italy gdp 1970"]} # User's question is copied over.
// User: What does the report say about the GPT4 performance on MMLU? => {"queries": ["What does the report say about the GPT4 performance on MMLU?"]}
// User: How can I integrate customer relationship management system with third-party email marketing tools? => {"queries": ["How can I integrate customer relationship management system with third-party email marketing tools?", "customer management system marketing integration"]}
// User: What are the best practices for data security and privacy for our cloud storage services? => {"queries": ["What are the best practices for data security and privacy for our cloud storage services?"]}
// User: What was the average P/E ratio for APPL in Q4 2023? The P/E ratio is calculated by dividing the market value price per share by the company's earnings per share (EPS).  => {"queries": ["What was the average P/E ratio for APPL in Q4 2023?"]} # Instructions are removed from the user's question.
// REMEMBER: One of the queries MUST be the user's original question, stripped of any extraneous details, but with ambiguous references resolved using context from the conversation. It MUST be a complete sentence.
type msearch = (_: {
queries?: string[],
}) => any;

} // namespace file_search

(and just keep scrolling…)

That’s the token cost of just adding this tool - a specification shouting at the AI. Then when it is called, you get potentially into the tens of thousands of tokens added to your chat response history.


the big tip:

Add a whole bunch of your own root-level multi-line description to your function.
Make it stand out and be different and even counter the file search description that can be confabulated.

Explain in the parameters that they cannot be used for anything but real seen data.

I just thought I’d note some more application-destruction that file_search is going to do for you.

There’s a message being injected before the latest user input.

In the screenshot, the injection has moved forward to the latest input not seen.

The information is of course A COMPLETE LIE if you are a developer and these files power your application.

1 Like

You were right, that was exactly the issue.

After updating the function description and being more explicit, it finally worked.
Thanks a lot!

1 Like