I just realized i am running into rate limits of 60 requests per minute using the Gpt4-1106-preview model.
I expected the RPM to be at 5000 as I am in Tier 3.
Does anybody have more information about this?
I just realized i am running into rate limits of 60 requests per minute using the Gpt4-1106-preview model.
I expected the RPM to be at 5000 as I am in Tier 3.
Does anybody have more information about this?
You can find your specific rate limits on your account page:
https://platform.openai.com/account/limits
Might be that youâre sending particularly long requests or are running into a rate limit that is shared with other models ĂŚ.
Thank you for your answer @N2U
Yeah especially therefore i am confused. It says:
600.000 TPM
5.000 RPM
and I get this error:
Error: 429 Youâve exceeded the 60 request/min rate limit, please slow down and try again
I sent like 200k tokens within 137 request that i batched up into batches of 40. Now i think this should actually not even be necessary but figured before that this works kinda stable⌠(probably because mostly the 61. request happens in another minute )
I actually switched to gpt-4-turbo-preview now
I appreciate every further help!
Hmmm, yeah, that is weird. Have you recently upgraded your account?
Sometimes it helps to pass your org id
in the request, but you could also try to create a new API key. It might be that the rate limits associated with your old key are cached somewhere, and thatâs whatâs causing the error?
hmm the upgrade is already a few months back. I changed the api key and added the OPENAI_ORG_ID to my env file and this line to the request âorganization: process.env.OPENAI_ORG_IDâ ,
But i still get the error that im exceeding the 60 requests per minute
Then there might be something wrong with your account on OpenAIâs end. I suggest you reach out to them at help.openai.com as soon as possible. We canât really help with account and billing-related issues here on the forum.
Ok thank you so much for all your help @N2U
I got this response:
Hi there, Thank you for reaching out to us regarding the rate limit error youâre encountering with the GPT-4 Turbo model. I understand that seeing an error message like this can be confusing, especially when youâre mindful of the rate limits as stated in your accountâs limits section. The error message youâre seeing, âError handling GPT request: RateLimitError: 429 Youâve exceeded the 60 request/min rate limit,â indicates that your requests have exceeded the rate limit of 60 requests per minute. This is different from the 800,000 tokens per minute (TPM) and 10,000 requests per minute (RPM) limits you mentioned, which are likely your token and request limits, respectively. Itâs important to note that rate limits can be quantized, meaning they are enforced over shorter periods of time. For example, a limit of 60,000 requests/minute may be enforced as 1,000 requests/second. Sending short bursts of requests can lead to rate limit errors, even when you are technically below the rate limit per minute. This might be why youâre encountering the 60 request/min rate limit error (What are the best practices for managing my rate limits in the API?). To address this issue, I recommend implementing a few best practices:
So do i understand it correctly? As i am sending like 200k tokens within around 100 requests in one second, i hit the limits as i have 10k requests in my limits spread to the minute?
First, letâs run that message through an AI proofreader and see what suggestions it has.
Not a single edit. This is AI bot text. Except for being framed in illiterate âHi there,â and then starting again with a capital letter, there is nothing human. Do that here and you get flagged. The contradiction in the text is exactly what youâd expect from bot hallucination.
Are you submitting to the Assistants endpoint, or to Chat Completions? Assistants seems to have a low preset limit that is separate from the AI model, but very similar to what you report.
If you are tier 3 or up, you should have a rate limit request box at the bottom of âlimitsâ. You can pick GPT-4 so that you can submit, and then explain the error and the limit encountered on chat completions API far below your tierâs expected limit. And that it is turbo models, not the selected gpt-4.
7 posts were merged into an existing topic: Canât sent message to ChatGPT (March 2024)
Thank you for checking that. I thought so too and was a little disappointed to even having to wait for an ai generated response⌠Also thank you for the suggestion on how to reach a real person! So you would also agree that its an error and i should be able to send my 100 requests with 200k tokens simultaneously?
13000 tokens per second (of 800k) should be your max target. Tokens instead of requests seems a much more stringent API requirement.
The rate 10000/800000 means 80 token input + output before your concern shifts.
You can set up token counting and queuing in your per-second limit parallel job to avoid bursty input to the API. Confirm your use of chat completions endpoint that does not limit you and does not carry massive token baggage with it.
And if you get denied running at the max, just set an adaptive rate learned from that. If tokens or requests even at a continuous rate are far below the tier limit, than it would seem that somethingâs gone wrong.