Embeddings API extremely slow

egils · March 4, 2025, 10:03am

Recently noticed that retrieving query embeddings via API has become extremely slow, comparing to what it used to be.
I’m used to millisecond responses, now it takes 10-20-40 seconds or even more than a minute to retrieve.

Running the same test:

2025-03-04 11:26:23,766 - embeddings_manager - DEBUG - Retrieving query embeddings from AIProvider.OPENAI
2025-03-04 11:27:28,835 - embeddings_manager - DEBUG - Retrieved query embeddings in 0:01:05.068843
2025-03-04 11:27:40,389 - embeddings_manager - DEBUG - Retrieving query embeddings from AIProvider.OPENAI
2025-03-04 11:27:50,139 - embeddings_manager - DEBUG - Retrieved query embeddings in 0:00:09.749344
2025-03-04 11:41:59,152 - embeddings_manager - DEBUG - Retrieving query embeddings from AIProvider.OPENAI
2025-03-04 11:42:44,588 - embeddings_manager - DEBUG - Retrieved query embeddings in 0:00:45.436057

Anyone else is experiencing this?

_j · March 4, 2025, 10:36am

I ran a quickie bit of code, sending just a list with one item, a short phrase, for ten trials against each model:

Model	Min (ms)	Max (ms)	Avg (ms)
text-embedding-3-large	560.55	1099.22	733.91
text-embedding-3-small	565.86	1332.11	832.81
text-embedding-ada-002	514.77	843.24	624.51

Seems to be no cases of a big delay.

I imagine that when you send more language, there can be more computational expense of token encoding and attention consumption, but I didn’t test huge texts or long lists of embeddings in one call.

Perhaps you can characterize what your are sending for one API call input for better replication and to avoid the “bad” case. Timeout and retry is also your friend.

genai_eng · March 4, 2025, 10:45am

Our solution is affected in the same way. Since yesterday, the response times of embedding models have increased significantly.
I don’t have systematic benchmarks from before yesterday, but based on observation, embeddings that used to take under one second can now take up to 30 seconds!

I tested using different VPNs, via cURL, and through the Python API, and observed the same effect. The embedding API has become unusable

hasnain.khan · March 4, 2025, 11:44am

We are also facing the same issue. Embedding API is taking inconsistent amount of time. Before it was always within 2-3 seconds but now response time varies from 3 seconds to 1-2 minutes.

egils · March 4, 2025, 2:06pm

Little bit offtopic, but where is official OpenAI channel for submitting such incidents? Any ideas?

_j · March 4, 2025, 2:51pm

We can flag the issue to be passed along to OpenAI’s channel here if it seems to be an issue that would take manual resolution and “fixing” for either the platform or a large section of developers.

Being able to replicate the issue and the conditions that cause it is more likely to bring out successful investigation and a fix. Also the tier of the account organization may be relevant again.

It’s important to note in your report if you are using the OpenAI SDK’s default “retry” mechanism that internally will timeout on a non-responsive API call and make several retries itself without reporting an error if you don’t set a retries=0 parameter on the client.

I upped the amount sent to embeddings: the complete works of William Shakespeare as an input source. To mirror a typical usage, sending a list of 5 strings per API call, and strings of 4000 characters to mirror a chunk similar to search retrieval size done by Assistants or for AI RAG injection, unique per model trial. Direct from me to OpenAI on non-shared resources.

=== 5-TRIAL LATENCY SUMMARY (ms) ===

Model	Min	Max	Avg
text-embedding-3-large	735.09	1733.14	1168.38
text-embedding-3-small	762.42	2527.65	1286.76
text-embedding-ada-002	691.51	1183.82	985.08

genai_eng · March 6, 2025, 5:10pm

Hello again!

The performance improved briefly for one day, but unfortunately , the latency issues have returned and worsened significantly.

We are currently using the Python SDK (openai==1.61.1 ) with the embedding model text-embedding-3-large . However, even with the smaller embedding models, latency remains a major concern. I’ve also tested using direct CURL requests, but the latency problem persists.

Here is an example of our current implementation:

self.client.embeddings.create(
            input=text,
            model="text-embedding-3-large")

For identical texts, latency was under 1 second just a week ago. Currently, latency varies significantly, averaging over 15 seconds, which is unacceptable for chatbot applications. Given that this situation isn’t reflected on the status page (https://status.openai.com/), using the embeddings API with such instability is practically impossible in a production environment.

I’m connecting to the API from Poland.
Please - help

maciej.kaszkowiak · March 7, 2025, 10:23am

Hey, I can confirm the latency issues both on 4th of March and today.

Connecting from Poland as well.

kaankork · March 7, 2025, 10:40am

Having a similar latency issue in Germany. I think it’s hard to generalise by running latency tests at a certain time. If you’re lucky you’ll get results quickly, if not you’ll wait longer. The real question is - how stable and robust is the API over long periods of time?

alshival · April 24, 2025, 12:01pm

Having the same issue. Am working on a risk algorithm that incorporates conversational text for a client with 22 million rows of data, and I’m getting about 10,000 rows per hour with the Embedding API.

OpenAi should consider giving customers a way of uploading bulk text for embeddings instead of making individual API calls because this is annoying. It’s not their fault. None of the other Ai companies have a bulk upload either. I should be able to upload a file with, say, 1 million lines of text, and OpenAi embeds each row for me. Like the model fine-tuning process, where we upload a file containing training data.

Fortunately, this client is on Snowflake, who offer their own embedding model. But this isn’t a solution for this kind of problem I can rely on with my other clients.

Topic		Replies	Views
Ada embeddings takes very long these couple days API ada	5	2628	December 18, 2023
API Very Slow Since 2023-01-05 API	6	2860	October 31, 2023
Embeddings API Response Slow API api	2	1969	January 30, 2024
API calls to davinci text 3 very slow and random speeds for identical prompts API	27	6972	December 25, 2023
GPT 3.5 and 3.5 16k around 300% slower since Friday for certain Orgs API api	9	993	December 11, 2023

Embeddings API extremely slow

Related topics