GPT-5 is very slow compared to 4.1 (Responses API)

_j · August 8, 2025, 7:51am

I’m not even convinced it’s smarter, period, smarter than the best-in-class you can send a refactor request to and get a one-and-done.

More time “vibing” time than it would take to code, with corrections, examples, needed, going outside the scope, chatting at you in the code and even in adding prints to “chat”.

Low effort: too dumb; medium effort: too disobedient and instruction-ignoring.

The AI has a behavior issue: it writes a lot of hedging. As in “this code is gonna run and fail silently, because I don’t know what I’m doing from all the speculations I made”.

Speed:

The billing lets us infer the token production rate we can’t see, what we’ll have to defer to in finding day-by-day performance.

Model (minimal)	Trials	Avg Latency (s)	Avg Stream Rate (tok/s)	Avg Total Rate (tok/s)
gpt-4.1-mini	10	0.628	75.441	72.007
gpt-5-mini	10	1.067	90.294	83.732

Note:

Latency is your first-token time of user experience.
Stream rate meters only the streaming window after first token (by tiktoken).
Total rate uses usage.completion_tokens over the entire call duration.
reasoning_effort = minimal can use as low as 8 tokens of reasoning. Default can consume all of a writing assignment benchmark’s 1024 max_completion_tokens internally.

Typical:

model gpt-5-mini: 1024 generated (incl. 8 reasoning), 1007 delivered of 1024 max, o200k_base
model gpt-4.1-mini: 1024 generated, 1024 delivered of 1024 max, o200k_base

Unique responses for gpt-4.1-mini (by first 60 chars):

6 | # Human Aspirations in a Post-Money World ## Introduction …
4 | Human Aspirations in a Post-Money World Introduction …

Unique responses for gpt-5-mini (by first 60 chars):

4 | Title: Human Aspirations in a Post‑Money World Introduction…
3 | Human Aspirations in a Post‑Money World Introduction Money…
1 | Title: Human Aspirations in a Post-Money World Introduction…
1 | Human Aspirations in a Post‑Money World Introduction — what…
1 | Human Aspirations in a Post‑Money World Introduction For m…

Topic		Replies	Views
We proved the API is intentionally slow API	56	18651	May 2, 2023
GPT-3.5 Turbo API response is slow API	20	12600	November 11, 2023
GPT-3.5 API is 30x slower than ChatGPT equivalent prompt API gpt-35-turbo , api	69	14173	November 30, 2023
OpenAI Why Are The API Calls So Slow? When will it be fixed? API	103	56250	February 19, 2024
GPT 4 API is Very Slow Still API gpt-4 , chatgpt , api	15	7002	December 16, 2023

GPT-5 is very slow compared to 4.1 (Responses API)

Unique responses for gpt-4.1-mini (by first 60 chars):

Unique responses for gpt-5-mini (by first 60 chars):

Related topics