Token Rate limit estimation clarification

mike.mcd · December 14, 2023, 3:05pm

I was looking for some clarification on how the rate limits for tokens are calculated.

For usage, when I make a request with an input prompt and a completion is generated the input/completion prompt token lengths are both calculated.

For rate limiting, the documentation states that:

Your rate limit is calculated as the maximum of max_tokens and the estimated number of tokens based on the character count of your request

If I understand this correctly, this is saying that before a request is processed it will check if max(input_prompt_token_length, max_tokens) would take you over the rate limit and then rejects the request.

Is this understanding correct?
Once the request has completed - is the total token length (input prompt and output completion) used in our rate limit calculation for the next request or does it still use the estimate number of tokens previously calculated?

For example, if I request a completion and my input prompt is 100 tokens and my max_tokens param is 200 tokens - for rate limiting purposes it will check if 200 tokens will go over the rate limit and reject the request if it it will.
If the request is valid and service, will the next request assume I have used 200 tokens from my rate limit or 300 tokens from my rate limit?

Topic		Replies	Views
How does rate limiting account for previous requests? API	0	85	March 28, 2024
Maximum tokens limit per request, also applicable to the Assistant API? API	0	119	April 12, 2024
Are the used tokens counted when request starts or ends? API rate-limit	2	428	October 28, 2023
Doubt on prompt tokens and completion tokens API api	2	153	April 18, 2024
Pricing based on actual or requested output length API	1	154	March 29, 2024

Token Rate limit estimation clarification

Related Topics