Even after specifying the “file_search”: { “max_num_results”: 2 } in an assistant createRun the response sometimes fetching more than 2 annotations from the vector store. Is this a bug or is there another way to do this…
CODE:
const createRunAndPollStatus = async (threadId, tools = [{
"type": "file_search",
"file_search": { "max_num_results": 2 }
}]) => {
const delay = ms => new Promise(res => setTimeout(res, ms));
const maxAttempts = 10; // Adjust based on your needs
try {
const createResponse = await axios.post(`https://api.openai.com/v1/threads/${threadId}/runs`, {
assistant_id: assistantId,
tools: tools
}, {
headers: {
"Authorization": `Bearer ${process.env.OPENAI_API_KEY}`,
"OpenAI-Beta": "assistants=v2"
}
});
Thanks
I’m using Azure OpenAI (API version 2024-07-01-preview) and have encountered the same problem. Even if I set this parameter to 3, the prompt token for file search in the GPT-4 assistant is still 16k, which equals 800 tokens * 20 chunks.
I think there might be some bugs on OpenAI’s side causing this parameter to malfunction.
assistant = await client.beta.assistants.create(
model="<PLACEHOLDER>",
name="<PLACEHOLDER>",
instructions="<PLACEHOLDER>",
tools=[
{
"type": "file_search",
"file_search": {
"max_num_results": 3
}
}
],
tool_resources={"file_search": {"vector_store_ids": ["<PLACEHOLDER>"]}},
)
yes I think its a bug too. I also found out I had to specify it as a tool choice to make sure it gets called everytime:
const createResponse = await axios.post(`https://api.openai.com/v1/threads/${threadId}/runs`, {
assistant_id: assistantId,
tools: tools,
tool_choice: {
type: "file_search"
}
Definitively a bug, a few experimentations lead me to believe the max_results (coded at the assistant or run objects) is doubled. This workaround may help:
max_results = assistant.tools[0].file_search.max_num_results
max_results = math.ceil(max_results / 2)
user_prompt = f"{user_prompt}\n - retrieve a maximum of {max_results} items"