How to reach a human at OpenAI support regarding high latency on the API?

I’ve previously posted about issues we face with high latency on the GPT4o API endpoints. We are once again facing these same issues, been happening now for 6 days (started 2025-09-18). No acknowledgement of the issue on the status page (as usual), and now it’s nearly impossible to reach a human at OpenAI to talk about the problem: half the time I get a response from an LLM. The other half the time they ask me to repeat the same info I’ve already repeated over and over.

We process thousands of requests a day that are all very similar (in size/shape), and send them to GPT 4o. Normally ~2% of our requests take more than 30 seconds. Since 9/18, we’re seeing the average to be 12% of requests, with some days over 15%, some hours over 20%.

Meanwhile we can get no help from OpenAI, no clarity on when this will be resolved, or even that they know it’s happening.

  • Use service_tier:priority along with your API call

  • See all the problems miraculously vanish along with the higher bill.

  • You don’t have to report, because OpenAI designed it that way.

There’s two other dated gpt-4o models for fulfilling requests, you can rotate out to those that are more performative day to day - or just a 50% increase for 2024-05-13.

Unfortunately I also cannot get anyone to reply back to us with more information on the priority tier. I’ve gotten their LLM support bot to give me two different answers on the subject: first it said it was available to every tier, then it said nope it’s only available to Enterprise. I can’t get anyone to explain how to move to Enterprise, but we’re a small startup so I doubt we can afford it in any event.

But if you have info on how we can (perhaps temporarily) pay more for good service, I would be all ears.

That is a good point about the other versioned 4o models, I should try those tomorrow if the latency issue persists, thank you for the suggestion. We are also working now to expand our use beyond OpenAI. Something we had always planned on, but now we’re doing it on a faster timetable due to OpenAI absolutely blowing it here.

You can include the parameter:

"model": "gpt-4o",
"service_tier": "priority,
...

Get immediately reproducible results, where I write here the generation rate will be double even before I run a benchmark script.

gpt-4o - no “service_tier” sent

model gpt-4o: 512 generated, 512 final delivered of 512 max, 9.3s
model gpt-4o: 512 generated, 512 final delivered of 512 max, 7.9s
model gpt-4o: 512 generated, 512 final delivered of 512 max, 11.2s
model gpt-4o: 512 generated, 512 final delivered of 512 max, 13.3s
model gpt-4o: 512 generated, 512 final delivered of 512 max, 14.5s

Model N Lat(s) Strm(t/s) Tot(t/s)
gpt-4o 5 0.972 54.380 47.964

gpt-4o, with request_args["service_tier"] = "priority"

model gpt-4o: 512 generated, 512 final delivered of 512 max, 5.4s
model gpt-4o: 512 generated, 512 final delivered of 512 max, 5.2s
model gpt-4o: 512 generated, 512 final delivered of 512 max, 4.8s
model gpt-4o: 512 generated, 512 final delivered of 512 max, 4.8s
model gpt-4o: 512 generated, 512 final delivered of 512 max, 8.6s

Model N Lat(s) Strm(t/s) Tot(t/s)
gpt-4o 5 0.906 113.023 92.776

Legend:

  • Model: Model
  • N: Trials
  • Lat(s): Avg Latency to first token (seconds)
  • Strm(t/s): Avg Stream Rate (tokens/second after first token)
  • Tot(t/s): Avg Total Rate (tokens/second)

Welp, I thank you yet again. Even though their docs indicate we cannot use this feature, I was able to integrate priority tier and it has dramatically lowered latency for us.

Sad that we have to resort to this, but at least it’s an option. No wonder they don’t publish high latency issues on the dashboard — if they did, you’d know when to activate this feature, and therefore, when you can turn it back off.

You probably have seen the guide for latency optimization in the documentation. The priority tier is one option among several others.
This is a challenge that many developers need to manage and there are several mitigation techniques one can implement based on the specific use case.

I hope this helps already.

https://platform.openai.com/docs/guides/latency-optimization

Our workload has not changed since we started it up in January. The only thing that has changed is that the 4o API has routinely had periods of high latency, with no explanation from OpenAI or even acknowledgment that it’s happening or that they’re working to fix it.

They’ve now complicated things further by wiring up an LLM to reply to support email so half the time you don’t even know if you’re being gaslit or what. (For example, it just told me don’t worry, it’ll forward my concerns to the most appropriate team, which I am pretty sure it can’t even do.)

They did this very similar stealth throttling a year and a half ago.

A bunch of organizations suddenly getting inexplicable poor performance, as if every token was purposefully put through a “not too fast” loop of being throttled.

Then the tier system was revealed. And it was the low tiers that were getting hammered with worse service for the same price.

At least now it is everybody getting hammered with worse service for the same price, right?


OpenAI should be transparent with exactly what they are doing. Seems like: Pay to get yourself out of mandatory “flex” processing, now applied across other models for the same price - not an option for a discount with poorer service, but an option to pay more.

That is very interesting, thanks!

For me another source of confusion is the reference to “Enterprise” API customers. On the site, the only references I see to Enterprise/Business/etc are for the chatgpt service. On the API side, I don’t see references to that, only tiers (we’re at tier 5).

API Enterprise Contracts

If you are using the OpenAI API and have a monthly spend of approximately $10,000 or more, you can also contact our Sales team through the sales contact form.

(help pages)

@t-doggy @_j Not to excuse OpenAI for high latency issues, but I think they are lacking compute coupled with ever increasing demand. Once the Nvidia “stacks“ are implemented in the new data centers, things should get better.

In the meantime, we are rolling with “service_tier”: “priority” which helps a lot.

it’s nearly impossible to reach a human at OpenAI to talk about the problem

This tells much of the efficiency of the so called ‘automated assistance’. It’s a very common problem since the advent of LLM-based chatbots: nearly all acencies rely on their trained chatbots but the quality is so poor… I myself have dropped at least 5-6 services since they automated the support.

This forum is a great place to highlight issues with the APIs, platform, and developer services. The community is actively monitored, and staff often jump in to share updates when fixes or new releases roll out.

I truly hope it serves as a helpful space for everyone—not just for learning and debugging, but beyond that too.

Hi, on my side I have quite a lot of trouble too, basically you have to push the bot to the moment he can not respond anymore. It will say “If you prefer to speak to a support specialist (a real person), just let me know and I will connect you.”. But I just got a respond from someone, not helpful at all even the bot was better. This firm treats its customers as a joke just like its investors.