Hitting Rate Limit with small group of Users?

I have a decent amount of traffic to my application, but nowhere near the rate limit of 3,500 RPM or 90,000 tokens a minute. However, I still regularly get the 429 “Too Many Requests” error. I have the means to scale quickly, but I’m afraid the real rate limits are not actually what’s posted in our organization dashboard. Does anyone else encounter this?

@jxl38

I’m in the same boat. I think a lot of the times, they’re counting “how many requests are being sent by everyone” instead of “by you”.

I guess we just have to be patient :frowning:

I think we can request an increase. I’m currently having the same issue and am cautious about inviting more people @jxl38

Addendum: except with GPT-4

"During the limited beta rollout of GPT-4, the model will have more aggressive rate limits to keep up with demand. Default rate limits for gpt-4/gpt-4-0314 are 40k TPM and 200 RPM. Default rate limits for gpt-4-32k/gpt-4-32k-0314 are 80k TPM and 400 RPM.

We are unable to accommodate requests for rate limit increases due to capacity constraints. In its current state, the model is intended for experimentation and prototyping, not high volume production use cases."

1 Like

Facing the same issue. Just making 4-5 sequential requests (with 1k prompt tokens max, 200-300 response tokens max), and we’re being hit by rate limit. [model GPT-4]

Same issue here - has only really picked up the last few days. I have been getting 429s on every third request or so but only consuming 5 calls per request and max 750 tokens across all 5 calls (have confirmed this through our usage dashboard). We have been PAYG customers for several months now, so not a function of being new / free user. Please help! OpenAI :slight_smile:

Last 12 hours or so, I’m getting 429 errors nearly 100% of the time for GPT-4 chat/completions requests. The error message is That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 93bd.... in your message.)

My rate of requests for GPT-4 (and in general) is far lower than my account rate limits. Based on the message, it looks like a general problem with the GPT-4 model rather than any specific thing I’m doing on my side. The status page is green, so I’m wondering if OpenAI is aware of this issue. I tried the help center, but that just let me report the issue to a bot with a 1 week response time. Does anyone know if there is another way to report API outages?

Also, to be clear, I’m using gpt-3.5-turbo. GPT-4 is completely unusable for production applications. Got a lot of one star reviews because of the aggressive rate limit on gpt-4. Removed it from the application altogether.

1 Like

Yeah, I am facing that sometimes too…

You might see these errors one day as well. :slight_smile:

Error 1.
That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx in your message.)

Error 2.
The server is currently overloaded with other requests. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if the error persists.

Just for clarification:

The error “Too Many Requests" is indeed related to the rate limit, I’ll suggest implementing exponential backoff. As mentioned earlier it’s not possible to have your rate limit increased :hugs:

The error “That model is currently overloaded with other requests” is exactly that, it means that OpenAI’s servers are currently overloaded and unable to process your request, it has nothing to do with your personal rate limit :laughing:

Think of this as a temporary issue while OpenAI is still trying to scale their infrastructure, the problem will go away by itself.

Just an update to the OpenAI team: I don’t seem to be rate limited anymore(?) and even ChatGPT-4 is faster and not rate limiting me. Bravo to the team for working hard on this stuff.

If you guys have that much traffic I’d definitely move to something more scalable. You should check out Microsoft Azures version of OpenAI. You’ll have to apply. The application is free. They will ask you about your company, what you are trying to do, etc. I got approved in a little less than a week if I remember correctly.

Azure OpenAI Service – Advanced Language Models | Microsoft Azure

I don’t know if OpenAI intends to be an AI service provider or just a research company. But I would definitely recommend getting onto something with higher availability if you have that much usage.

How do the OpenAI direct vs azure models compare and differ in terms of latency and functionality?

@keith_knox2 has some experience migrating his project from OpenAI to Azure so maybe he can chime in on the differences he has seen in terms of performance. In terms of functionality, the only thing that you will be unfamiliar with is the end point and the way Azure lets you access “resources”. The API commands themself and the responses are the same, so if you have a good setup, you can just make a Azure interface for your code.

Azure OpenAI Service REST API Reference

1 Like