O3-mini in Assistants not following through with function call

Trying to get the vibe of o3-mini here in the Assistants API…it will stream and say that its going to call one of my tools - but then the Thread just ends and the retrieved Run is marked complete.

I noticed that Pokrass on X said that there’s a known issue about function calling, is this it? Any thoughts on what to do about it?

2 Likes

Confirmation. :blush:

Looking forward to it!

2 Likes

I am having the same experience. Our application relies on function calling and most of the time the o3-mini and o1 models do not call actually call the function! GPT-4o and Sonnet 3.5 do a much better job. I was hoping that reasoning models would show improvement, but they seemed to have regressed with regards to function calling. Looking forward this being fixed in the next release. :slight_smile:

2 Likes

Yes same, same I think they trained the tool calling eagerness out by training the o series models to return fast when ever they “Think” they have a quick answer - but in reality they lack the domain knowledge to have a quick answer. It is very disappointing, these o series models are not naturally talented at this at all and fall short in my tests against 4o

There is a hackish workaround in the the chat API you can force function calling, I did not check if the same is supported with the assistant tools. You can set “tool_choice” when you create the request, where ever you do:
model=“o3-mini”,
messages=messages,
tools=tools,
tool_choice=“required”

the downside is that then when you loop over the tool calls it makes depending how you handle the request, the tool_choice=“required”
may stay put in the request and it never ends calling tools as it is not allowed to stop, so depending how you do it
you may need to remove the tool_choice from the request after some n tool loops that make sense - even better use mini model based agent to do a pre-step to figure out if you need to use tools and only add it when its needed to force the o3-mini based agent to do the work.

I have seen some improvements by adding this line to my developer message:

Function calling: Always execute the required function calls before you respond.

o3-mini seems very reluctant to perform function calls and some times it believes it has performed them when it clearly has not!

We need an official fix for this!

3 Likes

Agreed. My experience has been hit and miss lately, though it seem like its getting better? Very strange, sometimes its come back and run the required functions, other times no – can’t wait to see this fixed and hopefully get those CoT steps sent as SSE as well.

1 Like

Thank you Ludde, your idea works. Tested several times and did it well. Added it to the 'Formatting re-enabled" directive needed for markdown format. So my new developer message now starts with: “Formatting re-enabled\nAlways execute the required function calls before you respond.\n”

I experimented with adding “Always execute the required function calls before you respond.” at the start of my developer message and that does improve things, but it still sometimes fails to perform the call.

I have found that it will often choose to make up logs of tool/function calls rather than making the call. I really hope this gets fixed soon. This came about when we had requirements for tool calls baked into answer acceptance.

1 Like

I had a conversation with o3-mini about writing effective prompts for reasoning models. It suggested:

Reasoning Model (o3-mini):
– These models are optimized for chain‐of‐thought reasoning. That means your in-context examples should explicitly include intermediate steps or “think aloud” components.
– Provide detailed, step-by-step examples so that the model learns to generate its reasoning process before delivering the final answer.
– Be prepared to use longer prompts (more tokens) to capture the full chain of reasoning.

So, I rewrote my developer message where I give it chain-of-thought steps 1, 2, 3, etc. for example prompts including where functions should be called. This improved things.

We’re also encountering this issue in our application. We use both the Chat Completions API and the Assistants API, and with GPT-4 we get minimal output in terms of word count. If you ask for 1,000 words, you only get 500—even with reprompting—whereas our tool is capable of handling it properly. When you switch to GPT3o-mini, it works great! However, then function calls in Chat Completions don’t work (and file uploads in Assistants also don’t work). Is there any update on when a solution might be available, as this is not workable?

That’s really interesting since the docs seem to guide us a different way. Not doubting the results, just a bit confused on what is the best way to move forward on it. Thanks for sharing your experience with it!
https://platform.openai.com/docs/guides/reasoning-best-practices#how-to-prompt-reasoning-models-effectively

lol, the docs say " Avoid chain-of-thought prompts"! The thing I think helped most was explicitly listing the function call as a separate step in my COT example. I still think the function calling is somewhat broken in o3 and I hope they can improve it so it works as well as it does in ChatGPT-4o. Good reasoning + tool use seems like a very powerful combination.

Sometimes, it seems like it doesn’t even register the output of function calls. Even when an error occurs, it continues as if nothing happened. Chaining tool calls also doesn’t work at all.