I was trying to figure out what makes GPT-5 so slow and in playground I found that most of the time used is not reasoning but waiting for reasoning to start.
For example
effort: low, verbosity: low, summary: auto
input: 127t
output: 3594t (reasoning 3584t + text 10t )
”Thought for 32 seconds”
Total time 1m 24s
Most of the time is waiting with the 3 dots indicator, what is the model doing in that time? Is it a queue due to high use of new release or is it expected?
You are looking at the time-to-first chunk when you see UI interactivity.
The AI model has to produce some reasoning, enough to talk about.
Then the summarizer that prevents you seeing the true model generation has to abstract that away, an AI generating new language for a progress indication.
There can be additional delays, notably, setting up a context-free grammar when calling a new strict function or structured output response.
The Responses API is an interloper, it and Assistants indeed can behave like a fifo during busy times.
I’m wasn’t using functions or structured outputs. As of now, two days since launch, I found it too weak on “minimal” and too slow on “low” for most applications. Didn’t dare to try medium or high.
I find it very slow. It’s a dissappointing update. I also don’t find the output that useful.