Is there a way to call embeddings batch API synchronously?

I saw in LlamaIndex that they use a method call get_text_embedding_batch which takes in a list of strings and returns the embeddings in the response. But then according to the docs the batch APIs are async in nature and its response returns an ID using which we have to track the status of the job.

I just want to confirm if there is a batch API for embeddings that can be called synchronously?

You can send multiple strings to an embedding model in a single request. Which isn’t the same thing as using the batch API which allows you to send multiple requests asynchronously.

With just the embeddings ending you can send up to 2048-strings in an array.

See,

https://platform.openai.com/docs/api-reference/embeddings/create#embeddings-create-input

input
string or array

Required
Input text to embed, encoded as a string or array of tokens. To embed multiple inputs in a single request, pass an array of strings or array of token arrays. The input must not exceed the max input tokens for the model (8192 tokens for text-embedding-ada-002), cannot be an empty string, and any array must be 2048 dimensions or less. Example Python code for counting tokens.

The batch API just lets you queue up up to 50,000 such requests (up to a file size of 100mb any way).

1 Like

Thanks @anon22939549 for the reply.

Is there any difference regarding the pricing model of this? Like if I make one API call with a list of 100 strings vs 100 API calls each with a different string?

Same price, either way. But sending one request is faster and easier and depending on the exact nature of your requests and usage tier, will help you manage your rate limits better.