GPT-4.1 models are very slow due to API response.

We are seeing very high latency issues today in GPT-4.1 and the GPT-4.1 family (nano, mini)… is anyone else experiencing this? We are seeing delays of up to 80 seconds for a simple true or false in a 70-character system prompt.

We’re facing the same issues with gpt 4.1 in Open AI Assistants API. The responses are very slow and very error-prone.

We’re also running into the same high latency problems with GPT-4.1 in the OpenAI Assistants API.

Same issue here with gpt-5-mini

Choosing service_tier: priority, and chat completions as the endpoint without a further layer of problematic lag, see the token-per-second rates of models:

Model N Lat(s) Strm(t/s) Tot(t/s)
gpt-4.1 5 0.879 105.895 89.790
gpt-4.1-mini 5 0.452 174.733 151.332
gpt-4.1-nano 5 1.803 216.499 129.789
gpt-4o 5 0.756 85.586 77.185
gpt-4o-mini 5 0.416 162.951 143.150

OpenAI isn’t going to have you pay a higher price for priority if they don’t have a means of offering lower performance without the cost inflation, are they?

The above had one standout trial that greatly affects the averages (averages don’t drop the anomaly): a gpt-4o call that took over a minute.

...
model gpt-4o-mini: 512 generated, 512 final delivered of 512 max, 4.4s
model gpt-4o: 512 generated, 512 final delivered of 512 max, 61.1s

Now we can compare model speed at normal list price (these aren’t models that can be demoted even lower with “flex”.

The dropoff in performance is similar to that instituted in stealth against low payment tier organizations (before it was even announced) in 2023-2024:

Model N Lat(s) Strm(t/s) Tot(t/s)
gpt-4.1 5 0.534 55.740 52.573
gpt-4.1-mini 5 0.913 64.594 58.567
gpt-4.1-nano 5 0.707 83.646 75.478
gpt-4o 5 0.453 48.700 46.477
gpt-4o-mini 5 0.495 49.427 47.000

The total is the token generation rate when the whole request time is considered, not just the rate after tokens start being received.

Same issues here :frowning: very frustrating.

has anyone found any “hacks” to improve /workaround … vs going to another provider.

I am currently seeing if Finetuning helps