What is happening with GPT-5 performance?

It is massively slow - but i mean massively to the point of being unusable. I am developing an agent platform. As an example, one of my agents - admittedly, using a very heavy process - took 5 minutes and 33 seconds to answer a question. I switched the agent to Anthropic, Claude-Sonnet-4, and got a reply in one minute 10 seconds. Anyone else experiencing these issues?

1 Like

What I would suggest:

  • Capture the usage from the API response for the individual call.
  • See the total output tokens, both reasoning and the total which includes that reasoning.
  • Derive the token generation rate

If you want to pay double for the expected performance, 50+ tokens per second, you can make your API request with "service_tier": "priority". This will make a big difference - API generation seems throttled or low performance otherwise.

The use of internal tools can also take a toll, internally calling and re-calling the model, along with setup of strict structured outputs, if consuming OpenAI’s file search or web search is how you are defining “agent”…

Thanks, I wasn’t aware you could add “service_tier”: “priority” to the request. I will give it a go. Still, I need to determine if it is worth it or it is best to use some other service like Anthropic. As for Agent, I am, not referring to using tools. I am developing agents using elizaos which is a platform for agents.