Problems using Embedding API

I’m having a very odd problem using embedding api using python client.

I have some thousands of documents I want to get processed and send them in batches of 30 each.

Sometimes it works fine, all 30 documents are processed and I get a nice array of 1536 values for each document. But sometimes, the API returns a 12288-len array instead, which is not only is useless to me (because I can’t combine them with the previous 1526-len arrays) but I causes a huge peak on my consumption.

On my usage report I’ve noticed that when it works It logs requests to text-embedding-ada-002-v2, which is exactly what I expected. When it returns the bigger array and cost, I logs calling text-similarity-davinci and I have no idea why.

Due the inconsistent and unpredictable behavior I tend to assume it’s a bug (a very expensive one) but maybe you guys could give a hint.

the code is pretty straightforward:

def get_embedding(text, model="text-embedding-ada-002"):
    text = text.replace("\n", " ")
    result = openai.Embedding.create(input = [text], model=model)
    embeddings = result['data'][0]['embedding']
    return embeddings

textsDF['embedding'] = textsDF.progress_apply(lambda row: get_embedding(makeText(row)), axis=1)
```

Have you gained any insights into this? I send individual requests and was considering sending batches, but I would want to know that I won’t have the same issue.