Extremely slow API responses and hanging

doomjr40 · August 19, 2025, 1:21pm

All models are responding incredibly slow and a lot of them abort, is it just me? This has been happening for a few hours yet I am seeing no outage posts or anything similar

doomjr40 · August 19, 2025, 1:42pm

“openai stream failed: OpenAI 500:” I have been getting this for a long time now. Yet theyre proudly saying everything is OK on their status pages

_j · August 19, 2025, 1:43pm

I had 1/10 gpt-3.5-turbo-instruct take 10x as long. Other benchmarks just run, with only one gpt-4.1 call trailing far behind:

Model	Trials	Avg Latency (s)	Avg Stream Rate (tok/s)	Avg Total Rate (tok/s)
gpt-4o-2024-08-06	10	0.889	47.248	40.570
gpt-4.1-2025-04-14	10	1.448	58.342	44.327
gpt-4.1-mini-2025-04-14	10	0.826	48.091	41.659

The generation rate is indeed below expectations.

gpt-5 is hella slow, but making 2200 reasoning tokens at SLA 50 t/s is gonna feel that way..

doomjr40 · August 19, 2025, 1:46pm

Jeez, I have no idea what’s going on.
”
[OPENAI API ERROR] !!!
Status: 500
openai stream failed: OpenAI 500:
”

4.1, 4o, gpt 5

Barely responsive, fails half the time

0 status updates on their side

doomjr40 · August 19, 2025, 2:50pm

I have several instances of gpt 4o and 4.1 calls taking 64 seconds, an hour later

jeffvpace · August 19, 2025, 2:57pm

@doomjr40 @_j Sam needs more Nvidia Blackwell GPUs

doomjr40 · August 19, 2025, 3:02pm

Update: I am now seeing 3 instances of 200+ seconds. Still no announcemnet!

alexisleite.uy · September 12, 2025, 2:44pm

All api calls are much slower than 2 months ago.

Some calls takes more than 60 seconds while others takes even more than 4 minutes!

_j · September 13, 2025, 6:44am

Try the API parameter for "service_tier": "priority"

Magically, the speed capabilities of models come back.

I suspect “shrinkflation”: pay double for not getting the generation rate limited.

raulhernandezwork · September 13, 2025, 6:39pm

is this only available with enterprise api? i don’t see where to put this in my calls

_j · September 13, 2025, 6:53pm

You can add the parameter for "service_tier": "priority" at the same top level as you place “model”.

It only delivers a higher cost and higher level of service for the prominent current models, but seems to not error out when sent in conjunction with other models.

You can see the actual support and pricing in the pricing page, then compare to the normal pricing (they don’t show them side-by-side).

https://platform.openai.com/docs/pricing?latest-pricing=priority

raulhernandezwork · September 13, 2025, 7:16pm

Thank you. I did two things:

Set max_output_tokens = 32000 [really high number just to test]
Set service_tier = “priority”

My responses are actually completing now. I’ll run some benchmarks to see if setting max_output_tokens is enough for my use case, I appreciate it! Sometimes the openai docs seems all over the place.

geometry1607 · September 15, 2025, 5:17pm

when is this thing going to be fixed? I’ve been a API subscriber for a month and I’ve yet to see somewhat OK response time. I don’t think I’ll use chatgpt for chat anymore.

Topic		Replies	Views
Gpt-4-0125-preview INCREDIBLY slower than 3.5 turbo API	13	9801	December 29, 2025
Chat API is slow!, Fix it! API gpt-35-turbo , chatgpt , api	6	2969	December 24, 2023
GPT-4.1 models are very slow due to API response. API	6	909	December 29, 2025
GPT-3.5 API is very slow. Any fix? API	32	10214	December 29, 2025
GPT-5 + Responses API is extremely slow API gpt-5 , responses , responses-api	33	18346	October 27, 2025

Extremely slow API responses and hanging

Related topics