What are the valid embedding input values?

I am attempting to create the embeddings for each sentence in a book. So after removing all sentences less than 3 characters, I have a list of strings of length 2305. When I pass this list to the embedding input, I get "(LIST) is not valid under any of the given schemas - ‘input’. So it seems there are some values that are not being accepted. What are they? Are there certain characters which the embedding model does not accept?

It seems like you might be passing the entire array to the Embeddings API, instead of each individual sentence.

Yes I am. It accepts either a string or array. It works fine if I pass it a list of strings around 250 length. Is there a size limit?

I haven’t seen an limits specific to batch size, so maybe you’re hitting the 350,000 TPM rate limit?

If not, maybe there is a batch limit that just isn’t well documented.

If you’re still thinking this might be the problem, I would send one sentence per request so that you can see which is causing the error, and investigate it from there.

Yeah good suggestions. I was hoping for some documentation to clear up the guesswork.

Edit: @wfhbrian I was able to embed it by breaking the calls up. The problem was I was exceeding the 8192 token limit. Duh…

1 Like