Strange Agent Behaviour With Tool Calling September 4 2024

andrei5090 · September 4, 2024, 12:06pm

Hello,

I am performing some experiments for a research paper and I observed that the tool calling is no longer reliable in my gpt-4o (and mini) calls since this morning. I have users that participate in my experiment and they told me that the app is no longer functioning. I checked and it seems that since today, the LLM agent doesn’t understand anymore the stopping criteria of my agent that uses retrieval tools. This happens for both gpt 4o and gpt 4o mini. Important to mention is the fact that the code was not changed, nor the retrieval tools, so I might suspect it is something from a model change.

I checked the update pages of OpenAI but it doesn’t seem that there was a version update for the models today.

Does anyone encountered similar problems today?

jim · September 4, 2024, 4:38pm

Yes, if you search my recent posts I’ve experienced the same over the past week or so with tool calling no longer being stable or reliable. This is with Assistants/Streaming, and seen more often with gpt-4o-mini than gpt-4o.

sojan · September 5, 2024, 1:41am

Facing the same issue over our APIs…

maiwenn · September 5, 2024, 10:36am

JSON mode (response_format: {type: "json_object"})
with Javascript openai SDK 4.57.1 using openai.beta.chat.completions.runTools

In my case, model “gpt-4o” (gpt-4o-2024-05-13) is now calling all my tools on first request.

I switched to gpt-4o-2024-08-06,

A first user request with the need of a function is working
A second user request with the need of function now failed with 400 Invalid 'messages[2].tool_calls': empty array. Expected an array with minimum length 1, but got an empty array instead.

What I had to change is to exclude from my history messages the empty tool_calls array received on ChatCompletion where finish_reason !== "tool_calls"

const runner =  openai.beta.chat.completions.runTools({
    stream: false,
    messages: chatSession.history.toObject(),
    model: "gpt-4o-2024-08-06",
    temperature: 0.2,
    max_tokens: 1500,
    n: 1,
    tools: [getUserLocationTool, getWeatherTool],
    response_format: {type: "json_object"},
}).on("message", (message) => {
    if (message.role === "tool") {
        chatSession.history.push(message)
    }
}).on("chatCompletion", (chatCompletion) => {
    const message = chatCompletion.choices[0].message
    
    delete message.refusal
    delete message.parsed

    if (chatCompletion.choices[0].finish_reason !== "tool_calls" && message.tool_calls && !message.tool_calls.length) {
        delete message.tool_calls
    }

    chatSession.history.push(message)

})

const finalContent = await runner.finalContent()

await chatSession.save()

try {
    return JSON.parse(finalContent)
} catch(e) {
    console.log('Unable to parse finalContent')
}

andrei5090 · September 5, 2024, 11:50am

I tried to use the model gpt-4o-2024-05-13, and it seemed to work as before, as you did. Once in a while I get “output: multi_tool_use.parallel is not a valid tool, try one of [MY_TOOLS].”. It seems that the multi_tool_use.parallel is something internal the model uses, so I guess I just have to wait for them to deploy a fix. If I use gpt-4 or gpt-4-turbo, everything works as it worked before, but the omni version just crashes since yesterday.

Topic		Replies	Views
Any issue with gpt-4o model recently relating to tool calls? Bugs gpt-4	7	1822	September 3, 2024
Unexpected Tool Call Behavior with Response Format and Tool Descriptions in GPT-4o-mini API Bugs api , function-calling , response_format , gpt-4o-mini , structured-output	4	328	December 11, 2024
Tool Calls Stopped Firing Automatically API gpt-4	1	695	April 30, 2024
GPT-4-0125-preview hallucinating tool calls API gpt-4 , api , tools	3	2174	March 5, 2024
Function calling became unreliable API api	1	422	October 9, 2024

Strange Agent Behaviour With Tool Calling September 4 2024

Related topics