lets suppose I am a tier 1, Tier 1 is like below
$5 paid $100 500 RPM, 10K RPD 40K TPM (GPT-3.5), 10K TPM (GPT-4)
if gpt4 hit the limit, then how about chatgpt3.5??
When GPT-4.0 reaches the rate limit, does GPT-3.5 also get automatically rate-limited? Or do the rate limits for these two models operate independently, such that even if one is rate-limited, the other continues to function normally?
That’s actually not the case. If you consume well over your rate limit (by, for example, a set of parallel GPT-4 calls where you gave no max_token for the limiter to estimate), you’ll have locked yourself out of chat models until the “percentage over” resets, carried over multiple minutes.
You can blast off a few dollars worth of long completion GPT-4 all at once and verify for yourself. Or you can just believe me without me needing to get the corroborating evidence off the forum.
My software uses the LLM API methods, which don’t necessarily require the maximum token count. So, in this scenario, it largely depends on the input we provide.