In the documentation of Embeddings, here, the following is used
openai.Embedding.create(input = [text], model=model)['data']['embedding']
Is it equivalent (in the sense of getting same embeddings) to send
- (many) lists containing single strings - invoking
Embedding.create() multiple times
- single list with many strings
Yes, it’s supposed to be equivalent to send many strings in, and get many embeddings out.
I do this with batches of 500 at a time.
However, I have noticed two things:
- sometimes, I get NaN back for the embeddings; I have to re-try these
- the same string, embedded more than once, may return slightly different embeddings for each iteration
How to send batches of size 500?
I am using the above snippet; that gives
Too many inputs. The max number of inputs is 16.
Related GitHub issue
I usually run the call multiple times for a going through a bunch of strings. While it may feel a bit more time consuming, I have found the results to be far more consistent in terms of not encountering NaN’s and the similarity scores with my test string being the same or off by an insignificant amount