Discrepancy Between tiktoken Token Count and OpenAI Embeddings API Token Count Exceeding TPM Limit in Tier 2 Account

jcourson8 · September 27, 2024, 8:39pm

Full Issue Details:

When sending embedding requests to the text-embedding-3-large endpoint, I am encountering a RateLimitError for exceeding the 1,000,000 TPM limit for a Tier 2 account. Despite calculating token usage with tiktoken (cl100k_base tokenizer) and keeping the total tokens exactly at 1,000,000, the API returns an error indicating a higher token count (1,095,015).

Error Message via Python API:

openai.RateLimitError: Error code: 429 - {'error': {'message': 'Request too large for text-embedding-3-large in organization REDACTED on tokens per min (TPM): Limit 1000000, Requested 1095015. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}

Error Message via Curl:

{
    "error": {
        "message": "Request too large for text-embedding-3-large in organization REDACTED on tokens per min (TPM): Limit 1000000, Requested 1095015. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.",
        "type": "tokens",
        "param": null,
        "code": "rate_limit_exceeded"
    }
}

Steps to Reproduce:

Repo: I can’t add a link but its at on github at jcourson8/openai_token_count_error_replication.git (Requires Tier 2 with a 1,000,000 TPM limit)
Code:
- Embedding request: client.embeddings.create(model="text-embedding-3-large", input=documents)
- Token count function:
```
def openai_token_count(string: str) -> int:
    encoding = tiktoken.get_encoding("cl100k_base")
    num_tokens = len(encoding.encode(string, disallowed_special=()))
    return num_tokens
```
- Token count for documents: sum(openai_token_count(doc) for doc in documents)
Document Info:
- documents is a List[str] with length 15,758.
- Maximum individual document token count: 340 (well below the 8k limit).

Issue: Despite calculated tokens totaling 1,000,000, the API reports a request of 1,095,015 tokens.

Environment:

OS: macOS
Python Version: 3.10.11
OpenAI Library Version: 1.34.0

Diet · September 27, 2024, 9:21pm

Welcome to the community!

IIRC it was figured out a while back that this rate limiting is happening somewhere on a small Cloudflare worker that guesstimates the tokens as opposed to actually properly tokenizing your input. It’s quite possible that this hasn’t changed.

The guides (https://platform.openai.com/docs/guides/rate-limits/error-mitigation) (if you follow the links in the error message) actually advocate trusting the rate limit error message as opposed to pre-computing usage, and using exponential backoff to get all your requests through.

Soo… …is it a bug? Technically a feature? Hard to say But it’s definitely a phenomenon. that might need to be worked around.

While I record all tokens as computed, I also multiply everything by 1.2 (or 0.8, depending) to keep everything running smoothly.

jcourson8 · September 27, 2024, 10:05pm

Thanks for the quick response!

Definitely a bug… this totally destroys faith in a robust system. This hinders us from implementing any rate limiter that can actually utilize our limits efficiently.

_j · September 27, 2024, 10:14pm

What’s interesting is that the rate token “encoder” is not oblivious to the content, just poor.

6400 characters of English:

message len (char): 6400
gpt-3.5-turbo(00 @512,low): prompt usage: 1207, rate usage: 1607

6400 characters of Chinese:
message len (char): 6400
gpt-3.5-turbo(00 @512,low): prompt usage: 6407, rate usage: 4802

…Except in the case of images, where all “high” get the same rate impact of a 4-tile image regardless of dimensions.

Topic		Replies	Views
Discrepancy in token counts on prompting Bugs embeddings , api , token	1	104	November 24, 2024
I don't know where where my tokens are being used. I think it is wrong API gpt-4 , api , gpt-4-turbo	12	2029	December 10, 2023
Token counting in batch api/text embeddings API	4	214	April 18, 2025
Official tokenizer has huge count difference from OpenAI tokenizer API	12	5045	October 1, 2023
Reaching 1mio token count limit with parallel api calls API token	0	426	April 18, 2024

Discrepancy Between tiktoken Token Count and OpenAI Embeddings API Token Count Exceeding TPM Limit in Tier 2 Account

Full Issue Details:

Related topics