We are seeing very high latency issues today in GPT-4.1 and the GPT-4.1 family (nano, mini)… is anyone else experiencing this? We are seeing delays of up to 80 seconds for a simple true or false in a 70-character system prompt.
We’re facing the same issues with gpt 4.1 in Open AI Assistants API. The responses are very slow and very error-prone.
We’re also running into the same high latency problems with GPT-4.1 in the OpenAI Assistants API.
Same issue here with gpt-5-mini
Choosing service_tier: priority, and chat completions as the endpoint without a further layer of problematic lag, see the token-per-second rates of models:
| Model | N | Lat(s) | Strm(t/s) | Tot(t/s) |
|---|---|---|---|---|
| gpt-4.1 | 5 | 0.879 | 105.895 | 89.790 |
| gpt-4.1-mini | 5 | 0.452 | 174.733 | 151.332 |
| gpt-4.1-nano | 5 | 1.803 | 216.499 | 129.789 |
| gpt-4o | 5 | 0.756 | 85.586 | 77.185 |
| gpt-4o-mini | 5 | 0.416 | 162.951 | 143.150 |
OpenAI isn’t going to have you pay a higher price for priority if they don’t have a means of offering lower performance without the cost inflation, are they?
The above had one standout trial that greatly affects the averages (averages don’t drop the anomaly): a gpt-4o call that took over a minute.
...
model gpt-4o-mini: 512 generated, 512 final delivered of 512 max, 4.4s
model gpt-4o: 512 generated, 512 final delivered of 512 max, 61.1s
Now we can compare model speed at normal list price (these aren’t models that can be demoted even lower with “flex”.
The dropoff in performance is similar to that instituted in stealth against low payment tier organizations (before it was even announced) in 2023-2024:
| Model | N | Lat(s) | Strm(t/s) | Tot(t/s) |
|---|---|---|---|---|
| gpt-4.1 | 5 | 0.534 | 55.740 | 52.573 |
| gpt-4.1-mini | 5 | 0.913 | 64.594 | 58.567 |
| gpt-4.1-nano | 5 | 0.707 | 83.646 | 75.478 |
| gpt-4o | 5 | 0.453 | 48.700 | 46.477 |
| gpt-4o-mini | 5 | 0.495 | 49.427 | 47.000 |
The total is the token generation rate when the whole request time is considered, not just the rate after tokens start being received.
Same issues here
very frustrating.
has anyone found any “hacks” to improve /workaround … vs going to another provider.
I am currently seeing if Finetuning helps