Discrepancy Between tiktoken Token Count and OpenAI Embeddings API Token Count Exceeding TPM Limit in Tier 2 Account

Full Issue Details:

When sending embedding requests to the text-embedding-3-large endpoint, I am encountering a RateLimitError for exceeding the 1,000,000 TPM limit for a Tier 2 account. Despite calculating token usage with tiktoken (cl100k_base tokenizer) and keeping the total tokens exactly at 1,000,000, the API returns an error indicating a higher token count (1,095,015).

Error Message via Python API:

openai.RateLimitError: Error code: 429 - {'error': {'message': 'Request too large for text-embedding-3-large in organization REDACTED on tokens per min (TPM): Limit 1000000, Requested 1095015. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}

Error Message via Curl:

{
    "error": {
        "message": "Request too large for text-embedding-3-large in organization REDACTED on tokens per min (TPM): Limit 1000000, Requested 1095015. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.",
        "type": "tokens",
        "param": null,
        "code": "rate_limit_exceeded"
    }
}

Steps to Reproduce:

  • Repo: I can’t add a link but its at on github at jcourson8/openai_token_count_error_replication.git (Requires Tier 2 with a 1,000,000 TPM limit)
  • Code:
    • Embedding request: client.embeddings.create(model="text-embedding-3-large", input=documents)
    • Token count function:
    def openai_token_count(string: str) -> int:
        encoding = tiktoken.get_encoding("cl100k_base")
        num_tokens = len(encoding.encode(string, disallowed_special=()))
        return num_tokens
    
    • Token count for documents: sum(openai_token_count(doc) for doc in documents)
  • Document Info:
    • documents is a List[str] with length 15,758.
    • Maximum individual document token count: 340 (well below the 8k limit).

Issue: Despite calculated tokens totaling 1,000,000, the API reports a request of 1,095,015 tokens.

Environment:

  • OS: macOS
  • Python Version: 3.10.11
  • OpenAI Library Version: 1.34.0
1 Like

Welcome to the community!

IIRC it was figured out a while back that this rate limiting is happening somewhere on a small Cloudflare worker that guesstimates the tokens as opposed to actually properly tokenizing your input. It’s quite possible that this hasn’t changed.

The guides (https://platform.openai.com/docs/guides/rate-limits/error-mitigation) (if you follow the links in the error message) actually advocate trusting the rate limit error message as opposed to pre-computing usage, and using exponential backoff to get all your requests through.

Soo… …is it a bug? Technically a feature? Hard to say :confused: But it’s definitely a phenomenon. that might need to be worked around.

While I record all tokens as computed, I also multiply everything by 1.2 (or 0.8, depending) to keep everything running smoothly.

2 Likes

Thanks for the quick response!

Definitely a bug… this totally destroys faith in a robust system. This hinders us from implementing any rate limiter that can actually utilize our limits efficiently.

What’s interesting is that the rate token “encoder” is not oblivious to the content, just poor.

6400 characters of English:

message len (char): 6400
gpt-3.5-turbo(00 @512,low): prompt usage: 1207, rate usage: 1607

6400 characters of Chinese:
message len (char): 6400
gpt-3.5-turbo(00 @512,low): prompt usage: 6407, rate usage: 4802

…Except in the case of images, where all “high” get the same rate impact of a 4-tile image regardless of dimensions.