Rate Limits with Assistants API

oiookura · December 3, 2023, 5:05pm

OpenAI responses don’t include any x-rate… header for me (why?).
There’s no API for checking current limits.
I am ready to calculate it on my end, but It’s unclear how to calculate tokens and requests for the Assistants API. Does every request count? Is calculating tokens just for messages added with addMessage sufficient?

Any clarification or advice is appreciated!

Foxalabs · December 3, 2023, 5:29pm

Hi and welcome to the Developer Forum!

This is all being looked at by the team building the assistants functionality, making a thread in API Feedback is the best way to show these are features you would like.

oiookura · December 3, 2023, 8:31pm

@Foxalabs thanks! no action required from my side, right?

Also, it is not clear for me… are the usage limits for each model calculated separately, or does reaching the limit for one affect the other? For instance, if I use 50 requests per minute on GPT-3 and have a limit of 50 for GPT-3 and 100 for GPT-4, does this leave me with 50 or 100 remaining requests for GPT-4?

Foxalabs · December 3, 2023, 8:40pm

Typically, different models have separate rate limits. but there are also rate limit Tiers 1,2,3,4 and 5 which have larger values for those who spend more and have been reliably making payments for longer periods. Details here:

https://platform.openai.com/docs/guides/rate-limits?context=tier-free

_j · December 3, 2023, 10:38pm

Assistants can make multiple model calls autonomously and iteratively to only give back one response. Each of these internal AI calls will count against a rate limit in the API. One run may use indeterminate tokens. The number of calls indirectly determined by reading the number of run “steps” by API.

The models are pooled by type, although now there are many types where preview is a different rate from gpt-4. If you reach a “no more” point, this can affect multiple discrete model names. Someone really curious could rapidly poll two model calls sequentially and see where the requests per minute are inclusive of the previous model call.

One can imagine that assistants providing useful rate headers could be seen as another way to get consumed token counts and calls by assistants, something that OpenAI has shown they don’t want revealed at their immediately objectionable face value.

If not doing async parallel calls to muddy the statistics, a clever person could make small calls to the same model being employed before and after the assistant run to deduce the rate impact and token consumption.

Topic		Replies	Views
Documentation on the assistants-api rate limits API	3	90	August 26, 2024
Hello, someone now the assistants limits request? API	1	866	December 5, 2023
Assistant API tokens usage API api-usage , assistants-api	9	1592	November 14, 2024
Do Assistant-called function outputs count towards input tokens? API	8	1859	January 12, 2024
A single Assistant API method call exceeds Rate limit? Need advice API	5	2078	March 21, 2024

Rate Limits with Assistants API

Related topics