Semantic embedding: super slow 'text-embedding-ada-002'

amra.dorjbayar · January 21, 2023, 6:24pm

I’ve tried running “text-embedding-ada-002” to embed a text column of dataframe with 45K rows. Each row is not more than 200 characters. But it’s been 6 hours and the process is still not finished.

Did any of you had similar problems? Is there a solution? An alternative?

amra.dorjbayar · January 21, 2023, 8:26pm

Could it be that my code is not efficient for this task? Is there another alternative?

#use openai embeddings to get the embeddings for the claimReviewed column
df["embedding"] = df.column_name.apply(lambda x: get_embedding(x, engine="text-embedding-ada-002"))

curt.kennedy · January 21, 2023, 8:32pm

That sounds about right when sending one at a time. If you do batch processing, it should be faster:

To get embeddings for multiple inputs in a single request, pass an array of strings or array of token arrays. Each input must not exceed 8192 tokens in length.

raymonddavey · January 21, 2023, 8:49pm

Hi Curt, If you have tried this, do you have a practical quantity to send in one go? I was going to do batch processing but forgot to go back and revisit it.

amra.dorjbayar · January 21, 2023, 9:00pm

Well it looks like my new trial of 3 hours didn’t even get the half of it.

That multiple inputs per request, seems neat. How would you do that with a pandas dataframe column with lots of rows?

curt.kennedy · January 21, 2023, 9:01pm

Hi Raymond, no I haven’t used this feature myself.

But would just count the characters and limit it this way. I forget the equation of characters to tokens, but for Davinci I make sure I stay under 10k characters. But I think the ada-002 embedding is 4x larger, so maybe cap it at 40k characters per call? Probably go larger too.

curt.kennedy · January 21, 2023, 9:08pm

Personally I would just let it run at the slower rate instead of re-writing your code. But you would have a nested loop where you loop over the whole thing until you gather up enough data that doesn’t exceed the 8k token limit, then send it off to the API, get it back, and stuff the results into the frame and keep going to the next chunk. Not trivial.

Normally I just work with a database, and so I can query if I’ve already embedded something before sending it off to the API. Keeping the code simpler but waiting longer is acceptable in my situation.

raymonddavey · January 21, 2023, 9:15pm

I have a background process that chugs away whenever it sees a new series of text to encode. I trigger it to look every minute, and then if it finds records it keeps processing them for 50 seconds, and then restarts on the minute. (Was simple to code this way)

I did 7,000 rows in about 15 to 20 minutes - so it’s not a problem at this stage. My text blocks are 500 tokens or less.

My embeddings are stored in a proprietary vector database and linked to a supporting MS-SQL database for the original text and related data.

curt.kennedy · January 21, 2023, 10:02pm

I remember the first time I ran ada-002 embedding, I fed it 65k small chunks of text and it took at least 12 hours. I think it just finished after I woke up the next day.

Oh, and one more thing, the database I use is more of a repository. To run queries I use an in-memory version of the data, not the database because that would be too slow.

But a dedicated vector database is definitely the way to go for larger datasets.

ruby_coder · January 22, 2023, 3:44am

Not sure it is useful, but I ran two single line (only one prompt/completion pair each) fine-tunes yesterday, and it took over 10 1/2 hours for each fine-tune to complete (generate a new model).

That was nearly twice as fast as my recent “trivial” fine tunings.

Sounds “fast” to me

Recently, I got the impression is that the data size is less influential on the overall API processing time, versus the length of the OpenAI / Azure processing queue.

ted-at-openai · February 6, 2023, 6:47pm

Sounds like your code is making requests sequentially. Here’s an example script to make calls in parallel. Should be able to handle more than 100,000 tokens per minute.

github.com

openai/openai-cookbook/blob/main/examples/api_request_parallel_processor.py

"""
API REQUEST PARALLEL PROCESSOR

Using the OpenAI API to process lots of text quickly takes some care.
If you trickle in a million API requests one by one, they'll take days to complete.
If you flood a million API requests in parallel, they'll exceed the rate limits and fail with errors.
To maximize throughput, parallel requests need to be throttled to stay under rate limits.

This script parallelizes requests to the OpenAI API while throttling to stay under rate limits.

Features:
- Streams requests from file, to avoid running out of memory for giant jobs
- Makes requests concurrently, to maximize throughput
- Throttles request and token usage, to stay under rate limits
- Retries failed requests up to {max_attempts} times, to avoid missing data
- Logs errors, to diagnose problems with requests

Example command to call script:
```
python examples/api_request_parallel_processor.py \

This file has been truncated. show original

1000055691 · April 24, 2023, 12:08pm

I was trying doing the same with GPT 3.5 Turbo. But getting error -

InvalidRequestError: The embeddings operation does not work with the specified model, gpt-3.5-turbo

Topic		Replies	Views
Embedding model token limit exceeding limit while using batch requests API embeddings , token , batching	8	20387	October 15, 2023
Please help: why embedding took more 500 Minutes API	7	1892	April 9, 2024
Embedding large number of sentences API	13	9549	December 25, 2023
Is the embedding service overloaded? API	6	1270	December 18, 2023
Ada embeddings takes very long these couple days API ada	5	2403	December 18, 2023

Semantic embedding: super slow 'text-embedding-ada-002'

InvalidRequestError: The embeddings operation does not work with the specified model, gpt-3.5-turbo

Related topics