Client-side rate limiting

This is regarding the rate limiter OpenAI has deployed within CloudFlare. They don’t run a tokenizer (e.g. cl100k_base) while calculating rate limits. Tokenization happens after the rate limits are enforced. If you are using actual token calculation to figure out rate limits then you will be quite off.

PS: We tried working with tokenizer (tiktoken) and our limit estimates were quite off. With character_count/4 that is pretty much exact.

2 Likes