Frequent server_error on thread run

ivan11 · February 13, 2025, 10:23pm

Hi all,

We use gpt-4o in production with assistants, threads, vector, and all that., and for the past 48 hours or more a lot of thread runs end up with server_error status and Sorry, something went wrong. error message.

Also, from time to time a random api request tends to hold the connection for exactly 10 minutes to give the response back.

This all is super annoying and is really hard to keep any level of good product quality for customers in production.

Anyone else experiencing the same?

ivan11 · February 13, 2025, 10:42pm

Thanks.

I spent hours yesterday figuring out why my queue tasks are taking so long to complete, just to realise that 600 second API responses are happening on a regular basis.

What is happening with OpenAI server infrastructure right now?

ivan11 · February 13, 2025, 10:50pm

I wasn’t using ChatGPT web interface, only API.
And since 2+ hours ago it went wild. Server errors on most of AI assistant thread run replies.

Maybe it’s connected.

ivan11 · February 13, 2025, 11:16pm

Couple of problems I encountered by using OpenAI API:

API calls tend to respond with huge response time (not only LLM response, but any)
frequent server error statuses on thread runs
thread runs get stuck on cancelling status and eventually expire

This got me implementing a number of mechanisms to minimise the damage, like retries, scheduled retries, split queue tasks for one job, parallel queues, sequential queues, synchronous, asynchronous, you name it…

The most unstable API I’ve ever used and I used them a lot.

congxing · February 14, 2025, 1:27am

I have encountered the same issue in our production pipeline, which has worked well for almost 5 months. Now, I have to switch to a different implementation without any dependence on assistant api.

_j · February 14, 2025, 2:12am

It seems it is gpt-4o, and highly-related to employing file_ids. And ongoing.

A vision question that is similar (and meta), using specifically gpt-4o-2024-11-20 instead of the general alias provides us success, along with gpt-4-turbo as the vision AI model:

(That the AI is wrong in thinking the screenshot of our Assistants error is itself, and reporting it cannot do exactly what it is doing when it produces the response, is just more amusement)

endfish · February 14, 2025, 3:51am

I think today OpenAI should be deploying some services or using large-scale hardware resources to test the model, because today, whether it is API calls or the webUI interface, the dialogue quality of 4o has decreased significantly.

But I’m not sure if it’s just a problem with my personal account

congxing · February 14, 2025, 5:06am

Today, we officially switched our default model to gemini after gpt-4o responded “sorry I can’t help” for hours.

ericxgao · February 14, 2025, 6:00am

Not sure if this is also related, I’m getting a ton of stuck requests on o3-mini and 4o-mini. Essentially trying to create message, or create a run - I never get a response back from the API.

ivan11 · February 14, 2025, 9:38am

Looks like you’re right!
We have most of our assistants on gpt-4o models and some of them on gpt-4o-2024-08-11 model, and latter didn’t produce failed runs at all.

Thank you for this @_j !
Switching all to 4o-2024-11-20

Topic		Replies	Views
“Run failed Sorry, something went wrong.” when sending image to GPT-4o through API not even working with links Bugs api , image-reading , gpt-4-vision , gpt-4o	14	467	February 15, 2025
Gpt-4 is producing server error out of nowhere recently Bugs	15	4760	September 20, 2024
AssistantsAPI down on 11-16 Bugs assistants-api	16	327	November 21, 2024
Assistant API, failing runs when there are file IDs included Bugs	7	455	May 24, 2024
Is api not working? I get the error run is active right from the second request to the specified thread_Id API api	13	3735	December 15, 2023

Frequent server_error on thread run

Related topics