Sending list of strings to get embeddings

In the documentation of Embeddings, here, the following is used

openai.Embedding.create(input = [text], model=model)['data'][0]['embedding']

Is it equivalent (in the sense of getting same embeddings) to send

  1. (many) lists containing single strings - invoking Embedding.create() multiple times
  2. single list with many strings

Yes, it’s supposed to be equivalent to send many strings in, and get many embeddings out.
I do this with batches of 500 at a time.
However, I have noticed two things:

  1. sometimes, I get NaN back for the embeddings; I have to re-try these
  2. the same string, embedded more than once, may return slightly different embeddings for each iteration

How to send batches of size 500?
I am using the above snippet; that gives

Too many inputs. The max number of inputs is 16.

Related GitHub issue https://github.com/langchain-ai/langchain/issues/4575

I usually run the call multiple times for a going through a bunch of strings. While it may feel a bit more time consuming, I have found the results to be far more consistent in terms of not encountering NaN’s and the similarity scores with my test string being the same or off by an insignificant amount

1 Like