Is Anyone Getting Slow Response or Internal Server Error?

tiffiana · October 18, 2025, 10:55pm

Update: it’s back to normal now. Whatever problem the server was having yesterday seemed to be resolved.

I’m accessing gpt-5 using response API, and OpenAI status says everything is working fine.
It has been working all this week with no problem, but today I’m getting extremely slow response and sometimes internal server error:
‘An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. Please include the request ID req_2fda1a57a25b492ba527e2214511891a in your message.’
Is anyone having this issue? I’m in the northeast USA.

Is there an easy way to find out if I’m being Throttled?

aprendendo.next · October 18, 2025, 11:33pm

Hi.

You can check your own service health in the dashboard, based on your request history.

And openrouter also has an overall performance stats per model (I think this one is equivalent to chat completions though).

tiffiana · October 18, 2025, 11:57pm

This is what I see on my personal service health. Could these dips explain the lag and is there anything I can do about it?

_j · October 19, 2025, 12:13am

Chat Completions and “service_tier”: “priority” as an API parameter. Thus GPT-5 without ‘Responses’ arbitrarily deciding to include or drop past reasoning items that were resent, to degrade cache:

input tokens: 13147	output tokens: 6116
uncached: 987	non-reasoning: 1188
cached: 12160	reasoning: 4928

HTTP 200 (48011 ms)

That’s 127 tokens-per-second, currently weekend evening or past bedtime for much of the world. Still a long wait looking at nothing, but that’s from the reasoning about the task. Far faster than 0-day model release, indicating ‘efficiencies were found’.

I’m only making little calls to gpt-5-mini otherwise to Responses to keep on top of and classify the Responses endpoint’s recent failures. That very topic should degrade your trust in the endpoint, until OpenAI says what’s going on or what was failing with their state persistence.

For usage like this, small server state, gpt-5-mini is still staring at a blank screen for longer than one would want before anything is seen - even when streaming. I don’t have benchmarks to report vs typical on Responses, other than typing minimal chats at it just now.

What would you like help with right now?
--Usage--  in/cached: 24/0;  out/reasoning:226/64

About six seconds to streaming:

--Usage--  in/cached: 201/0;  out/reasoning:388/192

They seem to have made sure that the “Your Health” new feature on the platform site doesn’t say anything bad, except for hard errors reported. So, if it dips, that does mean there’s an issue of significance.

However 500 errors are often stimulated by bad inputs; I would try replaying the exact same ‘chat’ if it doesn’t rely on a ‘conversations’ that has changed, and classify the success/fail ratio on re-sending that API call body.

aprendendo.next · October 19, 2025, 12:32am

Appart from using service_tier='priority' (be aware of the increased prices) or trying chat completions (can’t run some tools), another thing that worked for me sometimes was to temporarily reduce the request rate.

_j · October 19, 2025, 1:00am

The health screenshots don’t give me enough to conclude much more about volume other than this is likely a single-user scenario, with a period of no usage that would be atypical for having an established user base.

But your comment does give rise to thoughts about ‘request rate’ - noting the routing by cache:

The API endpoints routes to the same server based on if an initial hash of input tokens, about 256, matches prior requests or a specific formula.
If you have a deployed application that is always using the same large system prompt, it is going to employ this pinning to a server instance to increase your cache hits.
That also means parallel requests will be increasing the load you experience.

You have "prompt_cache_key" as another top-level parameter not to enhance this caching hit frequency, but to break up the routing by pattern. Its value is also included in the hashing.

use a user ID, or even a chat session ID, and you will be signaling a preference for ‘chat’ caching, instead of ‘system message’ caching, and get more load distribution.

One even sees another opportunity here in the face of 500 server errors. If you were to automatically retry, would you persist against the same cache server instance, not changing your API call? OR, would you want to inject some updated text into prompt_cache_key, so you move elsewhere in OpenAI server land?

Topic		Replies	Views
GPT-3.5 API is very slow. Any fix? API	31	10130	October 12, 2023
[GPT-3.5-Turbo] ‘The server is overloaded or not ready yet’ errors API chatgpt , api	11	9118	February 4, 2024
Status code 503: That model is currently overloaded with other requests API	33	41151	March 21, 2023
Continuous gpt3 api 500 error: The server had an error while processing your request. Sorry about that! API	60	28816	December 2, 2023
Ugh, nonstop 500 "error processing" for 50%+ of calls (solved) API api	4	1222	February 7, 2024

Is Anyone Getting Slow Response or Internal Server Error?

Related topics