Embeddings service seems to be very unreliable atm, anyone else seeing this?

Here are just a few of the errors I’m seeing:

OpenAI HTTP Error (spotted in ruby-openai 6.3.1): {"error"=>{"code"=>503, "message"=>"Service Unavailable.", "param"=>nil, "type"=>"cf_service_unavailable"}}
OpenAI HTTP Error (spotted in ruby-openai 6.3.1): {"error"=>{"message"=>"The server had an error while processing your request. Sorry about that!", "type"=>"server_error", "param"=>nil, "code"=>nil}}
OpenAI HTTP Error (spotted in ruby-openai 6.3.1): {"error"=>{"message"=>"The server had an error while processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID blah in your message.)", "type"=>"server_error", "param"=>nil, "code"=>nil}}

Despite a :green_circle: status on status.openai.com.

Have you tried on the playground just to sanity check that it’s not a code issue?

Well, those are 500 embedding errors…

Hence the sanity check, I’m not saying that it’s not a server issue, I’m just looking for a second datapoint to confirm it.

it’s just a very standard API call that has been working well for months.

I’m using GitHub - alexrudall/ruby-openai: OpenAI API + Ruby! 🤖❤️ Now with Assistants, Threads, Messages, Runs and Text to Speech 🍾, which is a nice wrapper for Faraday for Open AI calls.

I’m hitting the server repeatedly and most go through but it only takes one failure to stop the train. I suppose I need to build in a retry to my rake task but I’m trying to avoid spawning sidekiq jobs for this task.

Completed a random API call (just for one embedding though lol). No issues on my end.

yeah, try 100 minimum and you will hit a bad one very quickly.

Thanks for the clarification. Yes, retry with exponential backoff and timeouts are requirements for a production ready application, there will be transient infra errors from time to time due to the shear load on the system and other out of building failover cases, I usually also implement some kind of user feedback “progress bar” if there is an end user expecting results in real-time.

1 Like

Sure.

I guess I was just “getting away with it” whilst times were good.

But this seems like an awfully poor service level.

Literally a failure every 100 calls.

2 Likes

retry is indeed necessary - but I’d also recommend collecting your requests in batches.

we have everything in queues with retry at this point :roll_eyes: :laughing:

yeah, to be clear this is not my “production” process exactly.

all my fully production process calls are on sidekiq jobs with exponential retry.

It’s a prep process to seed the embeddings DB, so it’s “semi-production” as it were.

It’s in the production environment but only run once at system initialisation.

1 Like

I feel your pain and I will report it back to OpenAI, but the level of demand for a resource limited service is staggering. It needs to get better and that is a primary focus. The bottom line is a very in demand product that is making use of massive amounts of hardware and network bandwidth that is very sensitive to small variations.

1 Like

yeah, I’m going to have to focus on building a seed batch job that just runs conservatively until it’s done.

thanks everyone!

2 Likes

It was not meant to dumb down the issue. Was just trying to check if there is perhaps a systemic server issue that would prevent any requests from getting through.

Good luck!

1 Like

Oh no problem.

I think it is a systematic “reliability” issue, but one offs are usually ok.

All good, I will just need to adopt best practice for this part of the architecture.

I will probably add two jobs:

  1. finds the next fifty embeddings that are missing
  2. spawns fifty sub-jobs to fill them

or some such - I’ll need some trick to cap the jobs

1 Like

I don’t know if I got this across, but I meant that you can send an array to the embeddings endpoints.

But I could be misunderstanding you :thinking:

1 Like

I was getting about 10% “500 server error” failures on chat completions just running a batch of 10 earlier today. Not a blip on the “downtime” report. Not a reason for a parse error in the type of request.

…as may come about in large scale neural network installations from tectonic activity or human physical activity in proximity to racks of computation devices…

image

2 Likes

"To embed multiple inputs in a single request, pass an array of strings or array of token arrays. "

Well I never!!

Thanks for that @Diet !

1 Like

Just an update: I’m seeing much improved reliability today.

I may still migrate to a more robust arrangement though …