We run a product based upon the Agents SDK. We were really looking forward to GPT-5. We got approved for it in the API and……it simply doesn’t work.
Some examples
- Requests that 4.1 got right (What is the weather Tuesday night → Call to our weather Agent), GPT-5 asks for clarification (What Tuesday do you mean?). Our system context includes the current date, time, and timezone (and location).
- It sometimes say it’ll call a tool and then doesn’t. (4.1 - Get me the price of TSLA. Calls our stock Agent and fetches the price. GPT-5 - “I’ll fetch you the price of TSLA!” but never actually calls the tool).
- Other times GPT-5 calls a tool with the wrong function parameters resulting in an error. This happens less but it does happen.
GPT-5 makes our product useless when we’ve never had any problems with 4.1. I posted about this on the Agents SDK Github and several other people confirmed that GPT-5 essentially breaks tool calling.
Have you had similar issues? What do we do about this? It is noticeably faster which is good but it just doesn’t work.
If I had to guess, this is due to the router sending us to smaller models. 4.1-mini and 4.1-nano exhibit similar problems which is why we don’t use them. But GPT-5 just constantly fails.