We use gpt-4o in production with assistants, threads, vector, and all that., and for the past 48 hours or more a lot of thread runs end up with server_error status and Sorry, something went wrong. error message.
Also, from time to time a random api request tends to hold the connection for exactly 10 minutes to give the response back.
This all is super annoying and is really hard to keep any level of good product quality for customers in production.
I spent hours yesterday figuring out why my queue tasks are taking so long to complete, just to realise that 600 second API responses are happening on a regular basis.
What is happening with OpenAI server infrastructure right now?
Couple of problems I encountered by using OpenAI API:
API calls tend to respond with huge response time (not only LLM response, but any)
frequent server error statuses on thread runs
thread runs get stuck on cancelling status and eventually expire
This got me implementing a number of mechanisms to minimise the damage, like retries, scheduled retries, split queue tasks for one job, parallel queues, sequential queues, synchronous, asynchronous, you name it…
The most unstable API I’ve ever used and I used them a lot.
I have encountered the same issue in our production pipeline, which has worked well for almost 5 months. Now, I have to switch to a different implementation without any dependence on assistant api.
A vision question that is similar (and meta), using specifically gpt-4o-2024-11-20 instead of the general alias provides us success, along with gpt-4-turbo as the vision AI model:
(That the AI is wrong in thinking the screenshot of our Assistants error is itself, and reporting it cannot do exactly what it is doing when it produces the response, is just more amusement)
I think today OpenAI should be deploying some services or using large-scale hardware resources to test the model, because today, whether it is API calls or the webUI interface, the dialogue quality of 4o has decreased significantly.
But I’m not sure if it’s just a problem with my personal account
Not sure if this is also related, I’m getting a ton of stuck requests on o3-mini and 4o-mini. Essentially trying to create message, or create a run - I never get a response back from the API.
Looks like you’re right!
We have most of our assistants on gpt-4o models and some of them on gpt-4o-2024-08-11 model, and latter didn’t produce failed runs at all.
Thank you for this @_j !
Switching all to 4o-2024-11-20