O3-mini in Assistants not following through with function call

jim · February 1, 2025, 1:52am

Trying to get the vibe of o3-mini here in the Assistants API…it will stream and say that its going to call one of my tools - but then the Thread just ends and the retrieved Run is marked complete.

I noticed that Pokrass on X said that there’s a known issue about function calling, is this it? Any thoughts on what to do about it?

jim · February 1, 2025, 6:09pm

Confirmation.

Looking forward to it!

fred.fischer1 · February 3, 2025, 1:10am

I am having the same experience. Our application relies on function calling and most of the time the o3-mini and o1 models do not call actually call the function! GPT-4o and Sonnet 3.5 do a much better job. I was hoping that reasoning models would show improvement, but they seemed to have regressed with regards to function calling. Looking forward this being fixed in the next release.

jyri.toomessoo · February 5, 2025, 11:36am

Yes same, same I think they trained the tool calling eagerness out by training the o series models to return fast when ever they “Think” they have a quick answer - but in reality they lack the domain knowledge to have a quick answer. It is very disappointing, these o series models are not naturally talented at this at all and fall short in my tests against 4o

jyri.toomessoo · February 6, 2025, 6:28am

There is a hackish workaround in the the chat API you can force function calling, I did not check if the same is supported with the assistant tools. You can set “tool_choice” when you create the request, where ever you do:
model=“o3-mini”,
messages=messages,
tools=tools,
tool_choice=“required”

the downside is that then when you loop over the tool calls it makes depending how you handle the request, the tool_choice=“required”
may stay put in the request and it never ends calling tools as it is not allowed to stop, so depending how you do it
you may need to remove the tool_choice from the request after some n tool loops that make sense - even better use mini model based agent to do a pre-step to figure out if you need to use tools and only add it when its needed to force the o3-mini based agent to do the work.

Ludde · February 11, 2025, 3:11pm

I have seen some improvements by adding this line to my developer message:

Function calling: Always execute the required function calls before you respond.

o3-mini seems very reluctant to perform function calls and some times it believes it has performed them when it clearly has not!

We need an official fix for this!

jim · February 11, 2025, 5:19pm

Agreed. My experience has been hit and miss lately, though it seem like its getting better? Very strange, sometimes its come back and run the required functions, other times no – can’t wait to see this fixed and hopefully get those CoT steps sent as SSE as well.

jalbarracin · February 13, 2025, 11:20am

Thank you Ludde, your idea works. Tested several times and did it well. Added it to the 'Formatting re-enabled" directive needed for markdown format. So my new developer message now starts with: “Formatting re-enabled\nAlways execute the required function calls before you respond.\n”

fred.fischer1 · February 15, 2025, 12:52am

I experimented with adding “Always execute the required function calls before you respond.” at the start of my developer message and that does improve things, but it still sometimes fails to perform the call.

michael777 · February 17, 2025, 5:37pm

I have found that it will often choose to make up logs of tool/function calls rather than making the call. I really hope this gets fixed soon. This came about when we had requirements for tool calls baked into answer acceptance.

fred.fischer1 · February 19, 2025, 1:29am

I had a conversation with o3-mini about writing effective prompts for reasoning models. It suggested:

• Reasoning Model (o3-mini):
– These models are optimized for chain‐of‐thought reasoning. That means your in-context examples should explicitly include intermediate steps or “think aloud” components.
– Provide detailed, step-by-step examples so that the model learns to generate its reasoning process before delivering the final answer.
– Be prepared to use longer prompts (more tokens) to capture the full chain of reasoning.

So, I rewrote my developer message where I give it chain-of-thought steps 1, 2, 3, etc. for example prompts including where functions should be called. This improved things.

hj1 · February 19, 2025, 10:08am

We’re also encountering this issue in our application. We use both the Chat Completions API and the Assistants API, and with GPT-4 we get minimal output in terms of word count. If you ask for 1,000 words, you only get 500—even with reprompting—whereas our tool is capable of handling it properly. When you switch to GPT3o-mini, it works great! However, then function calls in Chat Completions don’t work (and file uploads in Assistants also don’t work). Is there any update on when a solution might be available, as this is not workable?

michael777 · February 20, 2025, 1:52pm

That’s really interesting since the docs seem to guide us a different way. Not doubting the results, just a bit confused on what is the best way to move forward on it. Thanks for sharing your experience with it!
https://platform.openai.com/docs/guides/reasoning-best-practices#how-to-prompt-reasoning-models-effectively

fred.fischer1 · February 20, 2025, 4:07pm

lol, the docs say " Avoid chain-of-thought prompts"! The thing I think helped most was explicitly listing the function call as a separate step in my COT example. I still think the function calling is somewhat broken in o3 and I hope they can improve it so it works as well as it does in ChatGPT-4o. Good reasoning + tool use seems like a very powerful combination.

VRSEN · February 21, 2025, 5:13am

Sometimes, it seems like it doesn’t even register the output of function calls. Even when an error occurs, it continues as if nothing happened. Chaining tool calls also doesn’t work at all.

fred.fischer1 · February 28, 2025, 4:46pm

We can configure our application to work with different LLMs. Since function calling regressed in o1 and o3, ChatGPT-4o was our favorite model. However, this week I have been experimenting with Claude 3.7 Sonnet with extended reasoning enabled and it’s now the best performing model for our application which requires the model to write small scripts and then call a function to execute them. Claude is known for being good at coding and it performs well for our use case. I still want to try out o3 when they fix the function calling.

fred.fischer1 · April 18, 2025, 6:21am

Tried again with o4-mini and o3 and both work great for me with function calling! Nice to see OpenAI back.

Topic		Replies	Views
Only MOCK functions running? API	11	1000	February 14, 2024
Gpt-4o vs. gpt-4-turbo - function calling Bugs function-calling , gpt-4-turbo , gpt-4o	17	6150	September 23, 2024
How to get API Tool Call to choose just one tool accurately API	6	4131	January 31, 2024
My most important function is being called only very rarely API gpt-35-turbo , prompt , functions	7	2413	December 19, 2023
GPT-4o cannot properly call custom functions more than half the time Bugs gpt-4o	21	6472	April 9, 2025

O3-mini in Assistants not following through with function call

Related topics