Rate Limits for preview models?

I just realized i am running into rate limits of 60 requests per minute using the Gpt4-1106-preview model.

I expected the RPM to be at 5000 as I am in Tier 3.

Does anybody have more information about this?

You can find your specific rate limits on your account page:

https://platform.openai.com/account/limits

Might be that you’re sending particularly long requests or are running into a rate limit that is shared with other models æ.

Thank you for your answer @N2U

Yeah especially therefore i am confused. It says:
600.000 TPM
5.000 RPM

and I get this error:
Error: 429 You’ve exceeded the 60 request/min rate limit, please slow down and try again

I sent like 200k tokens within 137 request that i batched up into batches of 40. Now i think this should actually not even be necessary but figured before that this works kinda stable… (probably because mostly the 61. request happens in another minute :smile:)

I actually switched to gpt-4-turbo-preview now

I appreciate every further help! :raised_hands:

Hmmm, yeah, that is weird. Have you recently upgraded your account?

Sometimes it helps to pass your org id in the request, but you could also try to create a new API key. It might be that the rate limits associated with your old key are cached somewhere, and that’s what’s causing the error? :thinking:

hmm the upgrade is already a few months back. I changed the api key and added the OPENAI_ORG_ID to my env file and this line to the request “organization: process.env.OPENAI_ORG_ID” ,

But i still get the error that im exceeding the 60 requests per minute

Then there might be something wrong with your account on OpenAI’s end. I suggest you reach out to them at help.openai.com as soon as possible. We can’t really help with account and billing-related issues here on the forum. :confused:

Ok thank you so much for all your help @N2U

1 Like

I got this response:

Hi there, Thank you for reaching out to us regarding the rate limit error you’re encountering with the GPT-4 Turbo model. I understand that seeing an error message like this can be confusing, especially when you’re mindful of the rate limits as stated in your account’s limits section. The error message you’re seeing, “Error handling GPT request: RateLimitError: 429 You’ve exceeded the 60 request/min rate limit,” indicates that your requests have exceeded the rate limit of 60 requests per minute. This is different from the 800,000 tokens per minute (TPM) and 10,000 requests per minute (RPM) limits you mentioned, which are likely your token and request limits, respectively. It’s important to note that rate limits can be quantized, meaning they are enforced over shorter periods of time. For example, a limit of 60,000 requests/minute may be enforced as 1,000 requests/second. Sending short bursts of requests can lead to rate limit errors, even when you are technically below the rate limit per minute. This might be why you’re encountering the 60 request/min rate limit error (What are the best practices for managing my rate limits in the API?). To address this issue, I recommend implementing a few best practices:

  • Pace your requests: Avoid making unnecessary or redundant calls and pace your requests to stay within the rate limit.
  • Exponential backoff: Implement a backoff mechanism or retry logic in your code that respects the rate limit. This involves performing a short sleep when a rate limit error is hit, then retrying the unsuccessful request. If the request is still unsuccessful, the sleep length is increased and the process is repeated. If you’ve already implemented these strategies and are still facing issues, it might be helpful to review your application’s request patterns to ensure they are evenly distributed and not causing short bursts that exceed the shorter period rate limits. For more detailed guidance on handling rate limits, including examples of exponential backoff, please refer to our help article on How can I solve 429: ‘Too Many Requests’ errors? If you continue to experience difficulties or have any further questions, please don’t hesitate to reach out. Best,
    OpenAI Team

So do i understand it correctly? As i am sending like 200k tokens within around 100 requests in one second, i hit the limits as i have 10k requests in my limits spread to the minute?

1 Like

First, let’s run that message through an AI proofreader and see what suggestions it has.

Not a single edit. This is AI bot text. Except for being framed in illiterate “Hi there,” and then starting again with a capital letter, there is nothing human. Do that here and you get flagged. The contradiction in the text is exactly what you’d expect from bot hallucination.

Are you submitting to the Assistants endpoint, or to Chat Completions? Assistants seems to have a low preset limit that is separate from the AI model, but very similar to what you report.

If you are tier 3 or up, you should have a rate limit request box at the bottom of “limits”. You can pick GPT-4 so that you can submit, and then explain the error and the limit encountered on chat completions API far below your tier’s expected limit. And that it is turbo models, not the selected gpt-4.

7 posts were merged into an existing topic: Can’t sent message to ChatGPT (March 2024)

:smile: Thank you for checking that. I thought so too and was a little disappointed to even having to wait for an ai generated response… Also thank you for the suggestion on how to reach a real person! So you would also agree that its an error and i should be able to send my 100 requests with 200k tokens simultaneously?

1 Like

13000 tokens per second (of 800k) should be your max target. Tokens instead of requests seems a much more stringent API requirement.

The rate 10000/800000 means 80 token input + output before your concern shifts.

You can set up token counting and queuing in your per-second limit parallel job to avoid bursty input to the API. Confirm your use of chat completions endpoint that does not limit you and does not carry massive token baggage with it.

And if you get denied running at the max, just set an adaptive rate learned from that. If tokens or requests even at a continuous rate are far below the tier limit, than it would seem that something’s gone wrong.