All models are responding incredibly slow and a lot of them abort, is it just me? This has been happening for a few hours yet I am seeing no outage posts or anything similar
“openai stream failed: OpenAI 500:” I have been getting this for a long time now. Yet theyre proudly saying everything is OK on their status pages
I had 1/10 gpt-3.5-turbo-instruct take 10x as long. Other benchmarks just run, with only one gpt-4.1 call trailing far behind:
| Model | Trials | Avg Latency (s) | Avg Stream Rate (tok/s) | Avg Total Rate (tok/s) |
|---|---|---|---|---|
| gpt-4o-2024-08-06 | 10 | 0.889 | 47.248 | 40.570 |
| gpt-4.1-2025-04-14 | 10 | 1.448 | 58.342 | 44.327 |
| gpt-4.1-mini-2025-04-14 | 10 | 0.826 | 48.091 | 41.659 |
The generation rate is indeed below expectations.
gpt-5 is hella slow, but making 2200 reasoning tokens at SLA 50 t/s is gonna feel that way..
Jeez, I have no idea what’s going on.
”
[OPENAI API ERROR] !!!
Status: 500
openai stream failed: OpenAI 500:
”
4.1, 4o, gpt 5
Barely responsive, fails half the time
0 status updates on their side
I have several instances of gpt 4o and 4.1 calls taking 64 seconds, an hour later
Update: I am now seeing 3 instances of 200+ seconds. Still no announcemnet!
All api calls are much slower than 2 months ago.
Some calls takes more than 60 seconds while others takes even more than 4 minutes!
Try the API parameter for "service_tier": "priority"
Magically, the speed capabilities of models come back.
I suspect “shrinkflation”: pay double for not getting the generation rate limited.
is this only available with enterprise api? i don’t see where to put this in my calls
You can add the parameter for "service_tier": "priority" at the same top level as you place “model”.
It only delivers a higher cost and higher level of service for the prominent current models, but seems to not error out when sent in conjunction with other models.
You can see the actual support and pricing in the pricing page, then compare to the normal pricing (they don’t show them side-by-side).
https://platform.openai.com/docs/pricing?latest-pricing=priority
Thank you. I did two things:
- Set max_output_tokens = 32000 [really high number just to test]
- Set service_tier = “priority”
My responses are actually completing now. I’ll run some benchmarks to see if setting max_output_tokens is enough for my use case, I appreciate it! Sometimes the openai docs seems all over the place.
when is this thing going to be fixed? I’ve been a API subscriber for a month and I’ve yet to see somewhat OK response time. I don’t think I’ll use chatgpt for chat anymore.