GPT-5 Breaks the Agents SDK and Tool Calling

We run a product based upon the Agents SDK. We were really looking forward to GPT-5. We got approved for it in the API and……it simply doesn’t work.

Some examples

  • Requests that 4.1 got right (What is the weather Tuesday night → Call to our weather Agent), GPT-5 asks for clarification (What Tuesday do you mean?). Our system context includes the current date, time, and timezone (and location).
  • It sometimes say it’ll call a tool and then doesn’t. (4.1 - Get me the price of TSLA. Calls our stock Agent and fetches the price. GPT-5 - “I’ll fetch you the price of TSLA!” but never actually calls the tool).
  • Other times GPT-5 calls a tool with the wrong function parameters resulting in an error. This happens less but it does happen.

GPT-5 makes our product useless when we’ve never had any problems with 4.1. I posted about this on the Agents SDK Github and several other people confirmed that GPT-5 essentially breaks tool calling.

Have you had similar issues? What do we do about this? It is noticeably faster which is good but it just doesn’t work.

If I had to guess, this is due to the router sending us to smaller models. 4.1-mini and 4.1-nano exhibit similar problems which is why we don’t use them. But GPT-5 just constantly fails.

Running into the same issues here. Tool calling breaks. The behavior is also unpredictable
Where you able to solve this?

No. OpenAI’s own engineers on GitHub said to keep using 4.1 until they fix 5.

I am experiencing the same issue. changed model from 4.1 to gpt-5 and absolutely nothings works. speacialy with stream: true. will keep 4.1 for the moment.

Can you please share the github link?

where should we expect an announcement when this is fixed?

Hey is this issue fixed?
I am also having some issues about gpt 5.Not sure its on me or on the agents sdk though.

I also want to chime in that this is a critical issue for us. We have a ground transportation booking app and GPT-5 fails to call our booking function, and even hallucinates the confirmation #. GPT 4.1 works flawlessly. At this point, we cannot consider using GPT-5 in a mission critical setting.

No longer seeing the tooling issues with 5.2 which was just released.