Discrepancy in token counts on prompting

zadeoglu · November 21, 2024, 4:20pm

I am prompting text-embedding-3-large with a list of strings L.

response = client.embeddings.create( L , 'text-embedding-3-large')

I get the following error.

tokens per min (TPM): Limit 1000000, Requested 1249461.

However when I check the token count with tiktoken

encoding = tiktoken.encoding_for_model( 'text-embedding-3-large')
sum(len(encoding.encode(s)) for s in L)

I get 1759946.

What accounts for this discrepancy?

_j · November 24, 2024, 3:51am

Actual token counting takes computation. The rate limiter port must be able to deny thousands of requests a second just like it can accept thousands a second. It is said to be a Cloudflare worker.

Topic		Replies	Views
Discrepancy Between tiktoken Token Count and OpenAI Embeddings API Token Count Exceeding TPM Limit in Tier 2 Account Bugs embeddings , token , rate-limit	3	233	September 27, 2024
Discrepancy in Token Counts Between tiktoken and API Usage for o4-mini/gpt-4o-mini Bugs api	1	140	May 28, 2025
Chat Token counts inconsistency between playground platform and tiktokenizer API chatgpt , token	2	671	December 27, 2024
Fine-tune tokens lower than expected API fine-tuning , token , fine-tuning-problems	4	1064	December 8, 2023
Token counting in batch api/text embeddings API	4	214	April 18, 2025

Discrepancy in token counts on prompting

Related topics