Discrepancy in token counts on prompting

I am prompting text-embedding-3-large with a list of strings L.

response = client.embeddings.create( L , 'text-embedding-3-large')

I get the following error.

tokens per min (TPM): Limit 1000000, Requested 1249461.

However when I check the token count with tiktoken

encoding = tiktoken.encoding_for_model( 'text-embedding-3-large')
sum(len(encoding.encode(s)) for s in L)

I get 1759946.

What accounts for this discrepancy?

1 Like

Actual token counting takes computation. The rate limiter port must be able to deny thousands of requests a second just like it can accept thousands a second. It is said to be a Cloudflare worker.