I have been struggling with this new parallel tool calling. My tool calls are usually actions that I cannot feed back to the AI to get a second response. For example a user requesting an image - I just confirm that - it is indeed what they wanted and send them an image. I do not need to feed that back to the AI for a secondary response - which the AI usually rewrites as it sees fit (as shown in the documentation for parallel tool calling).
I would like the API to just select the appropriate action the first time. I have had the exact set of functions that I used with the older API and the selection was good, there were errors here and there but generally 95% of the time the model would select the correct functions.
Since November last year, the model’s responses are random with multiple tools returned - even completely irrelevant tools.
I reverted back to using the old function method instead of the tool method for a while - but even that stopped working as it used to. We now get customer complaints with wrong functions being called and people unsubscribing from our service.
Is there a way to increase the accuracy of tool selection and forcing the model to select just 1 tool - perhaps with the settings variables or prompt. I have tried everything.
Here is a way for you to test your functions and system prompt. Make a dummy Assistants API in the dev page, insert your system prompt in the instructions and add your function definitions. Actually, you do not even need to add your functions. You just need to mention the functions in the instructions. Then test your convo in the Playground. See if the functions are triggered as expected. Adjust your system prompt if some function are not being invoked or some parameters are missing, etc. It really helps to have such tools for function calling. Unfortunately, we can only have it for Assistants. But the good thing is, the resulting updated instructions and functions should work even in Chat completions API using the same model.
Perhaps I should clarify — I am referring to the function/tool calling in the ChatCompletions, not Assistant API. Specifically when you are sending multiple functions to the model. I can do it, I have tested with different system prompts, user prompts — for weeks now.
The problem is that I get inconsistent results - and more than one tool/function returned – which is a problem. I would like to replace this new parallel function calling with the original “one result called” at a time, or at least have the option to select the level of confidence the model has to return a function and even specify the default - fall-back function in case of doubt.
I am experiencing similar issues with the app I am working on. In my case, the (temporary) solution was changing the model from gpt-3.5 to gpt-4. Gpt-3.5 was calling functions very erratically, either ahead of time, skipping them entirely, or calling them in a completely inappropriate context. Gpt-4 seems to be much more consistent in this regard - so far I haven’t run into any issues with miscalled functions. Which leads me to believe that the problem is more with the model itself, rather than the function scheme or prompt design.
The drawback to changing model, of course, is the increased cost and significantly higher response time. If anyone has any solution to make Gpt-3.5 more deterministic in tool calling I’d be very interested in hearing about it.
I once tried the solution you are talking about last year and my costs went up 10X, considering I am operating an app with a free trial and we can get thousands of calls a day from free users who have not subscribed — I eventually took it back to 3.5 and decided to live with the errors until they find a solution.
So I regularly post in here with hopes that someone has figured it out.
Have you tried modifying the instruction of the 3.5 model or the instructions of the functions to limit using only one function at a time?
Something that has worked for me is using boolean params. For example, in the function add a parameter for each function you have, “function-one-name-called: false/true”, then instruct the model to set that to true if the function is called and only execute the function if the parameter of other functions is false.
I’ve done something similar for another use case and it’s been working great.
I will try that solution and see if it helps. Even if I can limit one function at a time - the problem is, the error rate has gone up, half the time it just picks the wrong function, so much that the entire process become unpredictable at times. You never know what you are going to get — and we are getting more and more complaints these days.
The problem is - the old system worked just fine, I dont know what changes were made to the models to make them more random.