Completion API rate limits

tdekok · July 4, 2023, 7:54pm

Let’s say I am generating API completions using the Babbage model.

According to the documentation, the rate limits for a Babbage model are:

Pay-as-you-go users (after 48 hours):
3,500 RPM
350,000 TPM

With the following addition on TPM for various models:

For our older models, the **TPM** (tokens per minute) unit is different depending on the model version:

|TYPE|1 TPM EQUALS|
| --- | --- |
|davinci|1 token per minute|
|curie|25 tokens per minute|
|babbage|100 tokens per minute|
|ada|200 tokens per minute|

In practical terms, this means you can send approximately 200x more tokens per minute to an `ada` model versus a `davinci` model.

Two questions, as this is confusing:

(1) My account page shows rate limits; specifically, it shows 3,000 (RPM) and 250,000 (TPM) for the Babbage models. Is the documentation out of date?

(2) Would my Babbage TPM be 250,000 x 100 = 25,000,000 per minute? If so, I assume the only way to saturate this would be by batching prompts into a single request (given that 2,048 x 3500 doesn’t even get close to 35 million)?

Thanks!

anon22939549 · July 4, 2023, 8:23pm

The documentation isn’t always exactly correct. I would assume whatever is shown on your account rate-limit page is accurate for your API account.

Yes, your babbage limit would effectively be 25M tokens per minute. If you are limited to 3,000 RPM at 2048 tokens per prompt, without batching you are correct that you would not come close to hitting the TPM limit. That said, remember TPM counts tokens in and out, so depending on how verbose your responses are you could effectively be doubling your use rate.

tdekok · July 4, 2023, 9:39pm

Thanks @anon22939549!

Regarding #2, I agree that this is what one would expect based on the documentation. However, I just ran a bunch of tests and can confirm that I am running into a 250,000 TPM limit when submitting batched requests. So the multipliers do not appear in effect, unfortunately.

Can anyone from OpenAI confirm that this is intentional?

Brief update: after some more experimentation, it does seem like the limit is slightly higher than 250K TPM, but nowhere near close to a 100x multiplier. More like a ~2x multiplier, anything beyond that I can’t sustain before hitting rate limits.

Topic		Replies	Views
Rate limit issue, very confused with results API	4	2692	December 22, 2023
Token/Tier Limits for account API gpt-4	0	87	December 2, 2024
Error Code: 429 Rate Limit Differs from Documentation Bugs chatgpt	1	116	December 23, 2024
Why is my gpt-4 TPM value 10,000 instead of 40,000? API gpt-4 , api , rate-limit	5	2918	September 14, 2023
Org-Level Rate Limitting Implementation Info? API chatgpt , api	0	557	May 25, 2023

Completion API rate limits

Related topics