Recent observations while forcing multiple tool calls in assistants api

Recently, I finally found the time to take a closer look at the file_search tool and the vector stores, as I find this option quite powerful.

My use case is to answer a question by forcing the assistant to use both the file_search and my custom function. The information that my assistant should use to answer my input question has to consist of chunks from the file_search and externally retrieved data.

In doing so, I encountered a few difficulties. file-search v2 is marked as beta, so I am aware of possible problems. Therefore, I would like to share my observations here in the hope that it may help one or the other to fix these problems.
(FYI: I always wrote incoming test message in such a way that a file_search should be triggered. The following observations mainly refer to the assistants API usage)

1. Occurrence of 'msearch'
I noticed that yesterday (when I was still using the openai npm module version 4.57.0) a tool call (requires action) of 'msearch' occurred repeatedly when forcing file_search (tool_choice: { type: 'file_search' } or 'required'), but basically file_search was not used here anyway. Since today (including the update to 4.58.1), this no longer seems to occur.

2. Calling several tool functions in a forced manner
Obviously it is not possible to call both tools (file_search and custom function) in a forced (consistent) manner. Neither via explicit instructions nor via the tool_choice: 'required' field. As described in the docs, the 'required' field only ever calls at least one function, but as mentioned, it is not possible to ALWAYS call EXACTLY function X and Y etc…

3. Correct behavior when thread is empty
If I set tool_choice: { type: 'file_search' }, my custom function is still (correctly) called in an empty thread. But as soon as there are recent messages in the thread, only file_search is called.

4. Missing chunks logs in step data
file_search is, as indicated, correctly called in principle and in the logged runsteps, score thresholds with corresponding found files are also correctly displayed. Unfortunately, the chunks used from the search are still missing here (see Improve file search result relevance with chunk ranking).

5. Duplicate calls of the same functions
In addition to point 4, I noticed that the same custom function occurs twice (but with different call IDs) in the requires action payload, which wasn’t the case before.

6. Irregular timeouts
As soon as I experiment with the above attempts, it happens from time to time that the request runs into a timeout.

In my case, a workaround for the 'forced' use of several tool functions would be to abort the run directly after the forced call of the custom function and to start a new one, with forced file_search and previously retrieved information as additional instructions attached. This costs more tokens and is certainly not in the sense of the matter.

A workaround for the problem mentioned in point 5, I have already implemented a set so that only a tool call with the same name gets the data. That works so far without any loss of quality.

I am referring here, among other things, to the following existing thread: Assistant + function calling + file search

Hope this helps. Thank you for your attention and have a nice day!

Cheers

3 Likes

Thanks for sharing. Can you share an example code piece about the following workaround?

1 Like

Let me guess: gpt-4o-mini?

1. Occurrence of 'msearch'

It cannot follow the instructions it is given. What is happening is that the file_search tool has a method msearch which is where a query is written, but the AI is too confused or too incapable to call the right function, or to recognize that the wrong function can’t use the other tool’s methods once inside. Thus you get invalid output arguments that were meant to be handled internally and not seen.

How the AI can be so inept to confuse myfiles_browser->msearch->query with functions->your function-> arguments is baffling, but it’s gone on for a long time. It just has a post-trained disposition to write ‘msearch’ wherever, I guess.

You can set the assistant top_p to 0.1 and see if you can constrain to the correct function usage if the AI actually is more certain of the right way to write functions.

3. Correct behavior when thread is empty

If I set tool_choice: { type: 'file_search' }, my custom function is still (correctly) called in an empty thread. But as soon as there are recent messages in the thread, only file_search is called.

Same symptom of poor model quality. No attention to important context, thus it can only revert to post-training of valuing a latest message and writing reweighted tool calling. Talk too long, and your entire application you were instructing is lost and you are talking to trained ChatGPT.

Switch the model to gpt-4-1106-preview and you get some semblance of quality back. gpt-4 on chat completions if you’d like your own functions to be used well, on a model that was the expected quality when assistants was created (and had even more elaborate retrieval functions).

So it is sad when the answer for a particular application is very commonly “don’t use the new models OpenAI calls a replacement”.

1 Like