Hey team, is there any formal documentation for the rate limits on the assistants API? I understand it isn’t the same as the standard GPT limits, but I’m not able to build my own rate limiter without this data.
I’m currently resorting to profiling manually, which kinda sucks.
Unfortunately, there isn’t any clear-cut documentation that outlines the rate limits specifically for the OpenAI Assistants API, especially compared to the standard GPT models.
While the API does have rate limits, the details are somewhat vague, unlike the more straightforward limits you find with the Chat Completion API. Also, you won’t get any rate limit headers from the Assistants API, which makes it tricky to track usage programmatically.
I feel your pain
1 Like
Importantly, this behavior seems to be inconsistent too?
For example, sometimes the error I get back says the rate limit is 200 requests / min, other times it says 1000 requests / min
Even further more, I’ve implemented rate limiters that throttle my code to well under <100 requests / min, and I still get the 1000 requests / min issue. I’m utterly lost.
Honestly OpenAI team, would just love some, any kind of documentation
100% Correct. I think there has to be an explanation on the Usage. It might not be for the public domain. If not, then I think OpenAI Models hallucinate on Token Usage too