A while ago I noticed that my gpt-4o application is not calling my tools consistently any more. Now I reverted back to gpt-4o-2024-05-13 and it is working fine and reliable.
I pass 2-3 tools in a call. The old model used them as expected, the new one calls them not reliable. I am not saying “randomly”, as it is not really random. In case of some prompt/tools it calls the tool(s) reliably, for other it never calls.
As for now, I could not figure out any systematic differences in those prompt/tool(s) items. The general structure of the prompts and the API call is consistent. Probably thenew model is sensitive on something of the particular tool parameters, but I could not make specific tests. With the spring LLM model, everything is fine.
I also added the tool_choice=“auto” argument for the calls for testing, but did not make any difference (this is the default anyhow).
Do you have similar experience?