Embeddings service seems to be very unreliable atm, anyone else seeing this?

merefield · February 14, 2024, 9:10am

Here are just a few of the errors I’m seeing:

OpenAI HTTP Error (spotted in ruby-openai 6.3.1): {"error"=>{"code"=>503, "message"=>"Service Unavailable.", "param"=>nil, "type"=>"cf_service_unavailable"}}

OpenAI HTTP Error (spotted in ruby-openai 6.3.1): {"error"=>{"message"=>"The server had an error while processing your request. Sorry about that!", "type"=>"server_error", "param"=>nil, "code"=>nil}}

OpenAI HTTP Error (spotted in ruby-openai 6.3.1): {"error"=>{"message"=>"The server had an error while processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID blah in your message.)", "type"=>"server_error", "param"=>nil, "code"=>nil}}

Despite a status on status.openai.com.

Foxalabs · February 14, 2024, 9:11am

Have you tried on the playground just to sanity check that it’s not a code issue?

Diet · February 14, 2024, 9:13am

Well, those are 500 embedding errors…

Foxalabs · February 14, 2024, 9:15am

Hence the sanity check, I’m not saying that it’s not a server issue, I’m just looking for a second datapoint to confirm it.

merefield · February 14, 2024, 9:16am

it’s just a very standard API call that has been working well for months.

I’m using GitHub - alexrudall/ruby-openai: OpenAI API + Ruby! 🤖❤️ Now with Assistants, Threads, Messages, Runs and Text to Speech 🍾, which is a nice wrapper for Faraday for Open AI calls.

I’m hitting the server repeatedly and most go through but it only takes one failure to stop the train. I suppose I need to build in a retry to my rake task but I’m trying to avoid spawning sidekiq jobs for this task.

jr.2509 · February 14, 2024, 9:17am

Completed a random API call (just for one embedding though lol). No issues on my end.

merefield · February 14, 2024, 9:18am

yeah, try 100 minimum and you will hit a bad one very quickly.

Foxalabs · February 14, 2024, 9:18am

Thanks for the clarification. Yes, retry with exponential backoff and timeouts are requirements for a production ready application, there will be transient infra errors from time to time due to the shear load on the system and other out of building failover cases, I usually also implement some kind of user feedback “progress bar” if there is an end user expecting results in real-time.

merefield · February 14, 2024, 9:20am

Sure.

I guess I was just “getting away with it” whilst times were good.

But this seems like an awfully poor service level.

Literally a failure every 100 calls.

Diet · February 14, 2024, 9:22am

retry is indeed necessary - but I’d also recommend collecting your requests in batches.

we have everything in queues with retry at this point

merefield · February 14, 2024, 9:23am

yeah, to be clear this is not my “production” process exactly.

all my fully production process calls are on sidekiq jobs with exponential retry.

It’s a prep process to seed the embeddings DB, so it’s “semi-production” as it were.

It’s in the production environment but only run once at system initialisation.

Foxalabs · February 14, 2024, 9:23am

I feel your pain and I will report it back to OpenAI, but the level of demand for a resource limited service is staggering. It needs to get better and that is a primary focus. The bottom line is a very in demand product that is making use of massive amounts of hardware and network bandwidth that is very sensitive to small variations.

merefield · February 14, 2024, 9:25am

yeah, I’m going to have to focus on building a seed batch job that just runs conservatively until it’s done.

thanks everyone!

jr.2509 · February 14, 2024, 9:32am

It was not meant to dumb down the issue. Was just trying to check if there is perhaps a systemic server issue that would prevent any requests from getting through.

Good luck!

merefield · February 14, 2024, 9:34am

Oh no problem.

I think it is a systematic “reliability” issue, but one offs are usually ok.

All good, I will just need to adopt best practice for this part of the architecture.

I will probably add two jobs:

finds the next fifty embeddings that are missing
spawns fifty sub-jobs to fill them

or some such - I’ll need some trick to cap the jobs

Diet · February 14, 2024, 11:39am

I don’t know if I got this across, but I meant that you can send an array to the embeddings endpoints.

But I could be misunderstanding you

_j · February 14, 2024, 12:04pm

I was getting about 10% “500 server error” failures on chat completions just running a batch of 10 earlier today. Not a blip on the “downtime” report. Not a reason for a parse error in the type of request.

…as may come about in large scale neural network installations from tectonic activity or human physical activity in proximity to racks of computation devices…

merefield · February 14, 2024, 12:14pm

"To embed multiple inputs in a single request, pass an array of strings or array of token arrays. "

Well I never!!

Thanks for that @Diet !

merefield · February 15, 2024, 4:12pm

Just an update: I’m seeing much improved reliability today.

I may still migrate to a more robust arrangement though …

Topic		Replies	Views
Is the embeddings endpoint down? API	2	1113	January 5, 2022
Embedding Models down? text-embedding-ada-002 API api	3	1309	December 18, 2023
Embeding model has trouble? API api	15	3674	August 8, 2023
Is the embedding service overloaded? API	6	1307	December 18, 2023
Intermittent Usage API 500 Errors API	3	1177	December 15, 2023

Embeddings service seems to be very unreliable atm, anyone else seeing this?

Related topics