Rate Limit even though Limit not breached

Rate limits can be quantized, meaning they are enforced over shorter periods of time (e.g. 60,000 requests/minute may be enforced as 1,000 requests/second). Sending short bursts of requests or contexts (prompts+max_tokens) that are too long can lead to rate limit errors, even when you are technically below the rate limit per minute.