Semantic embedding: super slow 'text-embedding-ada-002'

I’ve tried running “text-embedding-ada-002” to embed a text column of dataframe with 45K rows. Each row is not more than 200 characters. But it’s been 6 hours and the process is still not finished.

Did any of you had similar problems? Is there a solution? An alternative?

Could it be that my code is not efficient for this task? Is there another alternative?

#use openai embeddings to get the embeddings for the claimReviewed column
df["embedding"] = df.column_name.apply(lambda x: get_embedding(x, engine="text-embedding-ada-002"))

That sounds about right when sending one at a time. If you do batch processing, it should be faster:

To get embeddings for multiple inputs in a single request, pass an array of strings or array of token arrays. Each input must not exceed 8192 tokens in length.


Hi Curt, If you have tried this, do you have a practical quantity to send in one go? I was going to do batch processing but forgot to go back and revisit it.

Well it looks like my new trial of 3 hours didn’t even get the half of it.

That multiple inputs per request, seems neat. How would you do that with a pandas dataframe column with lots of rows?

Hi Raymond, no I haven’t used this feature myself.

But would just count the characters and limit it this way. I forget the equation of characters to tokens, but for Davinci I make sure I stay under 10k characters. But I think the ada-002 embedding is 4x larger, so maybe cap it at 40k characters per call? Probably go larger too.


Personally I would just let it run at the slower rate instead of re-writing your code. But you would have a nested loop where you loop over the whole thing until you gather up enough data that doesn’t exceed the 8k token limit, then send it off to the API, get it back, and stuff the results into the frame and keep going to the next chunk. Not trivial.

Normally I just work with a database, and so I can query if I’ve already embedded something before sending it off to the API. Keeping the code simpler but waiting longer is acceptable in my situation.


I have a background process that chugs away whenever it sees a new series of text to encode. I trigger it to look every minute, and then if it finds records it keeps processing them for 50 seconds, and then restarts on the minute. (Was simple to code this way)

I did 7,000 rows in about 15 to 20 minutes - so it’s not a problem at this stage. My text blocks are 500 tokens or less.

My embeddings are stored in a proprietary vector database and linked to a supporting MS-SQL database for the original text and related data.


I remember the first time I ran ada-002 embedding, I fed it 65k small chunks of text and it took at least 12 hours. I think it just finished after I woke up the next day.

Oh, and one more thing, the database I use is more of a repository. To run queries I use an in-memory version of the data, not the database because that would be too slow.

But a dedicated vector database is definitely the way to go for larger datasets.


Not sure it is useful, but I ran two single line (only one prompt/completion pair each) fine-tunes yesterday, and it took over 10 1/2 hours for each fine-tune to complete (generate a new model).

That was nearly twice as fast as my recent “trivial” fine tunings.

Sounds “fast” to me :slight_smile:

Recently, I got the impression is that the data size is less influential on the overall API processing time, versus the length of the OpenAI / Azure processing queue.


Sounds like your code is making requests sequentially. Here’s an example script to make calls in parallel. Should be able to handle more than 100,000 tokens per minute.


I was trying doing the same with GPT 3.5 Turbo. But getting error -

InvalidRequestError: The embeddings operation does not work with the specified model, gpt-3.5-turbo