GPT-4.1 models are very slow due to API response.

tpdmsales · September 4, 2025, 10:07am

We are seeing very high latency issues today in GPT-4.1 and the GPT-4.1 family (nano, mini)… is anyone else experiencing this? We are seeing delays of up to 80 seconds for a simple true or false in a 70-character system prompt.

Kinnari_Patwa · September 4, 2025, 11:57am

We’re facing the same issues with gpt 4.1 in Open AI Assistants API. The responses are very slow and very error-prone.

Kartik.Syal · September 4, 2025, 12:02pm

We’re also running into the same high latency problems with GPT-4.1 in the OpenAI Assistants API.

jesusgonzalezmarti · September 4, 2025, 1:34pm

Same issue here with gpt-5-mini

_j · September 4, 2025, 2:04pm

Choosing service_tier: priority, and chat completions as the endpoint without a further layer of problematic lag, see the token-per-second rates of models:

Model	N	Lat(s)	Strm(t/s)	Tot(t/s)
gpt-4.1	5	0.879	105.895	89.790
gpt-4.1-mini	5	0.452	174.733	151.332
gpt-4.1-nano	5	1.803	216.499	129.789
gpt-4o	5	0.756	85.586	77.185
gpt-4o-mini	5	0.416	162.951	143.150

OpenAI isn’t going to have you pay a higher price for priority if they don’t have a means of offering lower performance without the cost inflation, are they?

The above had one standout trial that greatly affects the averages (averages don’t drop the anomaly): a gpt-4o call that took over a minute.

...
model gpt-4o-mini: 512 generated, 512 final delivered of 512 max, 4.4s
model gpt-4o: 512 generated, 512 final delivered of 512 max, 61.1s

Now we can compare model speed at normal list price (these aren’t models that can be demoted even lower with “flex”.

The dropoff in performance is similar to that instituted in stealth against low payment tier organizations (before it was even announced) in 2023-2024:

Model	N	Lat(s)	Strm(t/s)	Tot(t/s)
gpt-4.1	5	0.534	55.740	52.573
gpt-4.1-mini	5	0.913	64.594	58.567
gpt-4.1-nano	5	0.707	83.646	75.478
gpt-4o	5	0.453	48.700	46.477
gpt-4o-mini	5	0.495	49.427	47.000

The total is the token generation rate when the whole request time is considered, not just the rate after tokens start being received.

Adam_Ginsburg · September 8, 2025, 5:07am

Same issues here very frustrating.

has anyone found any “hacks” to improve /workaround … vs going to another provider.

I am currently seeing if Finetuning helps

Topic		Replies	Views
GPT 4 API is Very Slow Still API gpt-4 , chatgpt , api	15	7178	December 16, 2023
Gpt-4o-mini is really slow API gpt-4o-mini	7	3847	December 29, 2025
Gpt-4-0125-preview INCREDIBLY slower than 3.5 turbo API	13	9869	December 29, 2025
Extremely slow API responses and hanging API	13	926	December 29, 2025
GPT-5 + Responses API is extremely slow API gpt-5 , responses , responses-api	33	18910	October 27, 2025

GPT-4.1 models are very slow due to API response.

Related topics