Even after specifying the “file_search”: { “max_num_results”: 2 } in an assistant createRun the response sometimes fetching more than 2 annotations from the vector store. Is this a bug or is there another way to do this…
CODE:
I’m using Azure OpenAI (API version 2024-07-01-preview) and have encountered the same problem. Even if I set this parameter to 3, the prompt token for file search in the GPT-4 assistant is still 16k, which equals 800 tokens * 20 chunks.
I think there might be some bugs on OpenAI’s side causing this parameter to malfunction.
Definitively a bug, a few experimentations lead me to believe the max_results (coded at the assistant or run objects) is doubled. This workaround may help:
max_results = assistant.tools[0].file_search.max_num_results
max_results = math.ceil(max_results / 2)
user_prompt = f"{user_prompt}\n - retrieve a maximum of {max_results} items"
The issue persists to this day. In Response API, tool calls to the File Search tool(to a vector store with a PDF book) counts to at least 20. The response output is maxxed at 20 can all filled with tool call.
This happens to only 4o-mini though. 4.1-mini and 4.1-nano make 1 call to the same vector store.