Is Anyone Getting Slow Response or Internal Server Error?

Chat Completions and “service_tier”: “priority” as an API parameter. Thus GPT-5 without ‘Responses’ arbitrarily deciding to include or drop past reasoning items that were resent, to degrade cache:

input tokens: 13147 output tokens: 6116
uncached: 987 non-reasoning: 1188
cached: 12160 reasoning: 4928

HTTP 200 (48011 ms)

That’s 127 tokens-per-second, currently weekend evening or past bedtime for much of the world. Still a long wait looking at nothing, but that’s from the reasoning about the task. Far faster than 0-day model release, indicating ‘efficiencies were found’.

I’m only making little calls to gpt-5-mini otherwise to Responses to keep on top of and classify the Responses endpoint’s recent failures. That very topic should degrade your trust in the endpoint, until OpenAI says what’s going on or what was failing with their state persistence.

For usage like this, small server state, gpt-5-mini is still staring at a blank screen for longer than one would want before anything is seen - even when streaming. I don’t have benchmarks to report vs typical on Responses, other than typing minimal chats at it just now.

What would you like help with right now?
--Usage--  in/cached: 24/0;  out/reasoning:226/64

About six seconds to streaming:

--Usage--  in/cached: 201/0;  out/reasoning:388/192

They seem to have made sure that the “Your Health” new feature on the platform site doesn’t say anything bad, except for hard errors reported. So, if it dips, that does mean there’s an issue of significance.

However 500 errors are often stimulated by bad inputs; I would try replaying the exact same ‘chat’ if it doesn’t rely on a ‘conversations’ that has changed, and classify the success/fail ratio on re-sending that API call body.