Max Total Embeddings Tokens per Request

cnash · May 7, 2025, 9:38pm

Is there documentation for the maximum number of tokens for all samples sent in a single embeddings API request?

We are seeing a limit of about 300K tokens (36 inputs of 8K each) but are unable to find any official documentation of this.

It would be nice to know if this varied by model, or changes in the future.

Edit to add more information:
The error looks like this when we send a batch of 244 inputs of 8191 tokens each (just under 2M) to text-embedding-3-small on tier 5:

Error code: 400 - {‘error’: {‘message’: ‘Requested 499651 tokens, max 300000 tokens per request’, ‘type’: ‘max_tokens_per_request’, ‘param’: None, ‘code’: ‘max_tokens_per_request’}}

Similar error with 850K tokens:
Error code: 400 - {‘error’: {‘message’: ‘Requested 319449 tokens, max 300000 tokens per request’, ‘type’: ‘max_tokens_per_request’, ‘param’: None, ‘code’: ‘max_tokens_per_request’}}

However sending smaller amounts works:
Usage(prompt_tokens=802502, total_tokens=802502))
Usage(prompt_tokens=753126, total_tokens=753126))
Usage(prompt_tokens=516731, total_tokens=516731))
Usage(prompt_tokens=506057, total_tokens=506057)
Usage(prompt_tokens=303067, total_tokens=303067))

_j · May 7, 2025, 9:54pm

What you may be experiencing is your organization rate limit. The error type would tell more..

For tier 1-2, it is 1M tokens per minute for embeddings, but token count is estimated, and a first request can be immediately blocked if too “spiky”.

cnash · May 8, 2025, 4:13pm

The error looks like this when we send a batch of 244 inputs of 8191 tokens each (just under 2M) to text-embedding-3-small on tier 5:

Error code: 400 - {‘error’: {‘message’: ‘Requested 499651 tokens, max 300000 tokens per request’, ‘type’: ‘max_tokens_per_request’, ‘param’: None, ‘code’: ‘max_tokens_per_request’}}

Similar error with 850K tokens:
Error code: 400 - {‘error’: {‘message’: ‘Requested 319449 tokens, max 300000 tokens per request’, ‘type’: ‘max_tokens_per_request’, ‘param’: None, ‘code’: ‘max_tokens_per_request’}}

However sending smaller amounts works:
Usage(prompt_tokens=802502, total_tokens=802502))
Usage(prompt_tokens=753126, total_tokens=753126))
Usage(prompt_tokens=516731, total_tokens=516731))
Usage(prompt_tokens=506057, total_tokens=506057)
Usage(prompt_tokens=303067, total_tokens=303067))

_j · May 8, 2025, 5:51pm

Well, found it,

try 1

sending 1200000 tokens — 600 items * 2000 tokens
Done in 2.66 seconds

try 2

sending 1202000 tokens — 601 items * 2000 tokens

Traceback (most recent call last):
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {‘error’: {‘message’: ‘Requested 300500 tokens, max 300000 tokens per request’, ‘type’: ‘max_tokens_per_request’, ‘param’: None, ‘code’: ‘max_tokens_per_request’}}

Jascha_Beste · May 12, 2025, 2:45pm

I ran into this as well and it looks like OpenAIs API does have a limit for how many tokens you can send in each request. Not only can you send only 2048 chunks to be embedded at the same time. The sum of those chunks can’t be larger than 300k “tokens”

However the limit of those 300k tokens is not calculated with the actual tiktoken tokenizer but instead is an estimate of 0.25 tokens per utf-8 byte. So you can actually embed 1 million tokens in one request as long as they use few enough bytes.

Seems like a very odd limit to me, considering the request itself has a limit of 2048 chunks. But hey now we know.

edit: I think this is actually a bug in the embedding endpoints. The limit according to their api spec is 300k tokens per request:

ll embedding models enforce a maximum of 300,000 tokens summed across all inputs in a single request.

https://platform.openai.com/docs/api-reference/embeddings/create#embeddings-create-input
However you can send tokens or raw strings.
If the payload is interpreted as a simple int array (array of tokens) it makes sense to simply divide the length in bytes by 4 and report that as the amount of tokens.
However when the user is sending utf-8 encoded text this leads to this weird behaviour where you can send way more (or way less) tokens, since the text will be tokenized after the size check.

Topic		Replies	Views
Embeddings API Max Batch Size API	2	9671	February 26, 2024
Is there a way to set a Token Limit for the OpenAI Embedding API Endpoint? API embeddings , token	1	1525	August 22, 2023
New Embedding model input size Documentation embeddings	3	2943	January 26, 2024
Some batches creation FAILED even though they were within the batch queue limit API embeddings , batch-api	0	224	December 23, 2024
TokenLimit increasing on embedding api API embeddings , chatgpt	0	90	February 8, 2025

Max Total Embeddings Tokens per Request

try 1

try 2

Related topics