Org-Level Rate Limitting Implementation Info?

lewis2 · May 25, 2023, 11:54pm

Has anyone gotten any more detail on how the rate limiting for the Chat Completion api works?

Using gtp-3.5-turbo, I’m regularly hitting 429 errors (which include my current usage which appears close to the limit, so it’s definitely org-level rate limiting, not global) even though I never exceed 30% of my 90k/minute limit (based on request start times).

I’ve tried a bunch of strategies around scheduling, backoffs and retries but the closest I’ve managed to get to the 90k TPM limit is about 30k TPM before I start seeing cascading failures.

From what I can work out, one of three things might be happening:

maybe the usage is calculated at the end of each request, which would make logical sense since the response length isn’t known upfront, but that seems pretty janky when a lot of api calls can take 20-40 seconds. It looks like when I fire off a bunch of calls over a minute sometimes they all return at nearly the same time causing a spike in quota and I have to pause for about a minute before continuing. this is unpredictable and would make the optimal throughput strategy is one where I severly limit the risk of hitting the rate limit, capping throughput at around 30k TPM. Anecdotally, this is my gut feeling - errors seem to correlate to a flood of responses close together, even though my request frequency is very consistent, as is my request/response size.
could I be using higher TPM than I think I am? I am calculating from usage.total in the responses. gpt-3.5-turbo doesn’t have an explicit “TPM unit” conversion factor in the docs, so I’ve been assuming it’s just 1.
am i wrong about where/when the usage quota is incremented? if the rate limiting is calculated based on max_tokens in the request then that would explain a good chunk of the difference between the 90k limit and the 30k I’m achieving but not all of it.

Topic		Replies	Views
Error Code: 429 Rate Limit Differs from Documentation Bugs chatgpt	3	219	February 24, 2025
Token/Tier Limits for account API gpt-4	0	126	December 2, 2024
Assistant Started Hitting TPM Limit With No Changes to Implementation API gpt-4 , token , rate-limit , assistants	1	149	October 31, 2024
Reaching 1mio token count limit with parallel api calls API token	0	410	April 18, 2024
I don't know where where my tokens are being used. I think it is wrong API gpt-4 , api , gpt-4-turbo	12	1982	December 10, 2023

Org-Level Rate Limitting Implementation Info?

Related topics