I’m using Agents SDK with Responses API. I was testing gpt-5 before switch and noticed that it takes around 1 minute for even a basic query.
The Traces shows that:
Step
Model
Input Tokens
Output Tokens
Duration
Triage Agent 1st call
gpt-5-nano
~3k
~2k
~9 seconds
Sales Agent 1st call
gpt-5
~5k
~2k
~40 seconds
Sales Agent 1st call
gpt-5
~9k
~1k
~20 seconds
The user query is “Hello”. The same system using gpt-4.1 family with the same prompt/context/user query takes only around 2-5 seconds.
I think this is related to Responses API because people using gpt-5 through Completions API seem to not having this problem.
—
Similar problems on other posts:
Just changing from gpt-4.1 family to gpt-5 family caused the reponse time of a simple “Hello” query with around ~5k system prompt to go up from ~2-5 seconds to ~1 minute.
Which user would wait ~1 minutes for “Hello”?
Can you post the links of the complaints here also? We may collect all here.
I think there is an infrastructure problem. If I set run in background to true and look at the status I see my requests just stuck in “queued” never switches to processing.
I’m using responses api. Let me try completions.
PS: No it’s nearly the same, I think there are heavy issues it’s not usable this way. Positive note is that results I received with a lot of patience were really promising.
I am using the Responses API. Prompts with c. 4k tokens have gone from a c. 5 seconds to 30+. I had to go through all my tests lengthening timeouts to even see what it is producing. The ones I looked at did look nice, but niceness at that cost is not worthwhile. I was unable to run a full eval run due to the slow response times. Gave up, went back to 4.1 for now.
Maybe you do need to tune the extra parameters, but at the moment I would not be able to run enough evaluation to assess the quality of the ‘less thinking’ version.
Not getting any errors, but every request to gpt-5 with basic medium will reason through a ton of tokens, but then no final output. This is with the ResponsesAPI.
I would occasionally get a response after several minutes last night, but now its producing nothing.
Are you sending or missing a preset max_completions tokens that limits the output budget? It’s now for setting how much you want to pay, not how much you want to see. The finish_reason will also be “length” if the output was truncated before delivery by parameter.
There’s so many prompt tuneups to bad behavior, you could send a book of stuff for the gpt-5 model to still ignore.
GPT-5 is unusable for me too (Responses API). Had to revert to 4.1. It’s super slow and I keep getting failures due to max tokens, even with verbosity set to low. Never had that happen before.
I see now it’s faster but a few hours ago it was taking around 30 seconds at “minimal” and several minutes at “low”. Didn’t try the other two. I saw in playground it was thinking very slowly so I think it was a overload issue on their side
Turned out my blank responses were because the reasoning tokens were hitting 2048 which was the apparent default output allowed. Once I bumped max_completion_tokens up to 5000 GPT-5 can’t stop talking.
Why was this flagged. I love ChatGPT and I really really want this to work but using it everyday it has become unbearable. It goes back and forth. Sometimes it somewhat fast and by fast I mean at most 10 seconds for a response which is already not good to waiting for 5 minutes for a response. Do you have any idea how agonizingly painful it is when you are trying to code something. Clearly something is wrong and needs to be addressed but what worries me even more is that there is no official communication telling me that there are issues, at least I haven’t seen it. Everyone from openai is operating as if everything is great.
Everything you see makes talking about the consumer ChatGPT product off-topic.
ChatGPT not a thing that API-using developers can help you improve, except to tell you to now pick “mini” over there in the new selector, for faster start of visible output.