I am prompting text-embedding-3-large with a list of strings L.
response = client.embeddings.create( L , 'text-embedding-3-large')
I get the following error.
tokens per min (TPM): Limit 1000000, Requested 1249461.
However when I check the token count with tiktoken
encoding = tiktoken.encoding_for_model( 'text-embedding-3-large')
sum(len(encoding.encode(s)) for s in L)
I get 1759946.
What accounts for this discrepancy?
1 Like
_j
November 24, 2024, 3:51am
2
The quality is not impaired by the rate limit mechanism.
The rate limiter is a simple estimate. The rate limiter is like a firewall. The only function is to block API requests from reaching AI models if the limit set or the limits of an organization are exceeded.
The language tokens are an estimate that are close but not the actual amount.
The images have a fixed rate consumption regardless of any settings: 771 tokens per image
Because it must block excessive requests, neither deep inspectio…
Actual token counting takes computation. The rate limiter port must be able to deny thousands of requests a second just like it can accept thousands a second. It is said to be a Cloudflare worker.