I’m dealing with a bit of a rate limit puzzle on my Tier 5 OpenAI account and could use some insights.
Here’s the gist: My account is supposed to have 10k RPM and 2M using gpt-3.5-turbo-1106. However, when I try to run more than 10 assistant requests simultaneously – whether I use Promise.all or without it – I keep getting this error:
Error: 429 You’ve exceeded the 60 request/min rate limit, please slow down and try again.
Error invoking assistant: RateLimitError: 429 You’ve exceeded the 60 request/min rate limit, please slow down and try again.
This is confusing because my supposed RPM limit is way higher. Interestingly, when I switch to function calls instead of assistants, I don’t encounter any such limits, and I can run numerous requests just fine.
Did I overlook something about a different rate limit for assistants in the docs? Would love to get some clarity or advice on this.
Curious how you are timing your checks for run status=completed - do you have any delays on those - not sure how those count against the limit. But it serems that at least in my case request completions take several seconds so I have them on a 2 second interval.
So I think your RPM is 10k so in 60,000 ms you can have 10,000 requests. If you ever manage to run two requests in UNDER 6ms you will trigger that threshold EVEN if you are not actually ever getting to 10,000 in a second.
So you are talking about 50 - lets say 60 concurrent requests. They all have the same profile and without functions they will all be checked and retried in the same way. How fast do you load those 50 without functions? If they launch very close together you could get retries that are too close together?
I have a list of documents where i batch 25 documents at a time. Each document is being processed by multiple function calls (5) using promise all so 5 concurrently and then also one after that checks the formatting. Using the standart function calls i never see a rate limit error only formatting and Unicode error, but this is expected.
When I switch to assistants i can only do a fraction of this since i only have 60 request pr min. The overall process is the same, the functions have just been replaced with the assistant invokes. So here i have backoff and also reduced the amount of documents to 5.
If i run the assistants sequential I still run into the rate limit error. Using concurrently i run into rate limit error much more often
You might think your document would be embedded in a retrieval database, but no, there’s a browse function where the AI must traverse documents in multiple calls - so you pay the costs instead of OpenAI paying for the embed.
However, like you say, the rate limit error is reporting 60RPM, not 10000RPM. It may be one unseen and lingering on gpt-3.5-turbo-1106 because of the “preview” status and the “will never be used as the main production AI now” status of gpt-3.5-turbo-1106. You can go to the very bottom of the accounts “limits” page and make a request specific to the model and the rate, noting that the rate schedule is not seen in practice.
That was what i thought, but i can use the gpt-3.5-turbo-1106 in all other aspect of the API and have 10k RPM. But when using it in combination with the new Assistants i see this rate limit error.
Hi there,
Sorry you're running into issues regarding rate limits!
Rate limits, which are restrictions we place on the number of API calls you can make, exist so we can make sure everyone has fair access to the API. If you're bumping up against these limits, here are some strategies you might try:
Reduce max_tokens : Reducing max_tokens to match the size of your completions. Since max_tokens factor into your rate limit calculation, this adjustment might resolve the issue if your Current tokens used are exceeding your token Limit.
Optimize Your Requests: Batch requests and employ strategies like exponential backoff along with other error mitigation tactics.
Wait for 48 Hours: If you're a new pay-as-you-go user, be aware that we place daily rate limits during the first 48 hours. More details on your specific rate limits can be found here.
Check Your Quota: Ensure you're not exceeding your monthly spending quota. If you need adjustments, you can do so through the quota increase form.
Ensure you're on our Pay-As-You-Go-Plan: Update your billing with credit card details for the API Platform (not ChatGPT) here. Explore (or free trial users) are heavily restricted, regardless if you already have credits or grants in your account.
Still encountering issues? You can request a rate limit increase by filling out our Rate Limit Increase form. Please note that this applies only to certain models, as gpt-4 and gpt-3.5-turbo-16k are currently capacity constrained and we can't offer increases today.
If these steps don't resolve your issue, please provide more details, and I'll be glad to assist you further.
- OpenAI Team
Pretty sure this is just ChatGPT replying
But this is nice, I pay about 5k dollers each month and this is the response xD
Did the original poster or anyone else ever find a solution to this? We have this challenge, too. Seems the 60RPM is specific to using the Assistants api.
The solution was to just abandon assistants for now.
We have decided to switch from using assistants to using functions for the time being. Functions seem to offer the same capabilities, and since assistants are currently quite buggy, we believe this is the best approach.