I’m having a very odd problem using embedding api using python client.
I have some thousands of documents I want to get processed and send them in batches of 30 each.
Sometimes it works fine, all 30 documents are processed and I get a nice array of 1536 values for each document. But sometimes, the API returns a 12288-len array instead, which is not only is useless to me (because I can’t combine them with the previous 1526-len arrays) but I causes a huge peak on my consumption.
On my usage report I’ve noticed that when it works It logs requests to text-embedding-ada-002-v2, which is exactly what I expected. When it returns the bigger array and cost, I logs calling text-similarity-davinci and I have no idea why.
Due the inconsistent and unpredictable behavior I tend to assume it’s a bug (a very expensive one) but maybe you guys could give a hint.
the code is pretty straightforward:
def get_embedding(text, model="text-embedding-ada-002"):
text = text.replace("\n", " ")
result = openai.Embedding.create(input = [text], model=model)
embeddings = result['data'][0]['embedding']
return embeddings
textsDF['embedding'] = textsDF.progress_apply(lambda row: get_embedding(makeText(row)), axis=1)
```