Embeddings API Max Batch Size

Is there any documentation around what’s the max batch size for the embeddings API? I’m trying to pass batch of texts as input to the API and would like to maximize throughput while respecting the API rate limits.

This is an interesting question. Let’s make documentation.

[Total tokens for 3-small: 2183 embeddings] Counted: 128026; sending now…

- Embeddings failure Error code: 400 - {'error': {'message': "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.", 'type': 'invalid_request_error', 'param': None, 'code': None}}`

So we know what fails - and what we don’t have to pay for. After finding success, simple thing to knock of an embedding from the list at a time and see the exact value…

2042:“[2039] It was just dark now. I never went near the”
match score: -0.1131
2043:“[2040] “Good lan’! is dat you, honey? Doan’ make n”
match score: -0.0350
2044:“[2041] It was Jim’s voice—nothing ever sounded so "
match score: -0.0612
2045:”[2042] “Laws bless you, chile, I ’uz right down sh"
match score: -0.0043
2046:“[2043] I says:”
match score: 0.0742
2047:“[2044] “All right—that’s mighty good; they won’t f”
match score: -0.0253

maximum embeddings batch list length: 2048

I’ll leave it up to you to max the thing out on 8k list items and see…

1 Like

Looks like max batch size is 2048

Testing code -

from openai import OpenAI

def test_max_batch_size(client, batch_sizes):
    test_text = "This is a test."  # Sample text to duplicate for the batch.
    max_supported_batch_size = None

    for batch_size in batch_sizes:
            texts = [test_text] * batch_size  # Create a batch of duplicated texts.
            response = client.embeddings.create(
            # If the request is successful, record this batch size as currently the largest successful one.
            print(f"Batch size of {batch_size} succeeded.")
            max_supported_batch_size = batch_size
        except Exception as e:
            # Handle specific exceptions or failures based on the API's error responses.
            print(f"Batch size of {batch_size} failed with error: {e}")
            break  # Exit the loop on the first failure.

    if max_supported_batch_size:
        print(f"Maximum supported batch size is {max_supported_batch_size}")
        print("Unable to determine the maximum supported batch size, all tested sizes failed.")

# Example usage
client = OpenAI()
batch_sizes = [2048, 2049, 3072, 4096, 5120, 6144, 7168, 8192, 9216, 10240]  # list of batch sizes.
test_max_batch_size(client, batch_sizes)


Batch size of 2048 succeeded.
Batch size of 2049 failed with error: Error code: 400 - {'error': {'message': "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
Maximum supported batch size is 2048
1 Like