API aborts my connection without a reason - anything I can do?

andrea.sottana2 · March 17, 2023, 5:14pm

I am making python calls to the API via the openai library.
I have included a timer so that in my request loop I never exceed the rate limits.
After a few hours of running the code without issues, I got the following error message

openai.error.APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

which comes from line 528 of api_requestor.py.

My question is: why is this happening? Is this to do with a temporary lack of my internet connection? Is there anything within my control to prevent this from happening, or is this fully due to some instability of the API or an API overload? It is very frustrating when it happens because I lost all my output data as they were not yet saved to disk when the error was raised.

jazzg · March 19, 2023, 8:43pm

Have you solved this? Noticed this yesterday and today as well.

ondrassssek · March 19, 2023, 9:56pm

happens to me as well. did you figured it out?

adilnaut · March 19, 2023, 10:26pm

Happened to me as well just now. Also getting bad gateway sometimes.

robeatz20 · March 19, 2023, 10:56pm

Same here. 502 bad gateways, slow response time, conn resets

RonaldGRuckus · March 19, 2023, 11:39pm

Can’t usually do much with a 5xx error.

Based on the GPT-4 scaling I imagine it’ll be a bit of a bumpy ride until it’s sorted.
Buckle up folks, technology is changing fast and these interruptions are expected.

jazzg · March 19, 2023, 11:57pm

I was getting bad gateway issues during yesterday’s outage.
Today it’s mostly the above Remote end closed connection without response, 104, 'Connection reset by peer' or timeout error like here.

Are all of these errors due to some systemic issue on the API side?

RonaldGRuckus · March 20, 2023, 12:10am

If it’s intermittent, in most cases, yes.

Extra characters for the bot, ignore this

andrea.sottana2 · March 20, 2023, 10:41am

Thanks everyone, good to know I’m not the only one getting these issues. It seems it’s just a matter of trying again and again until it works then

AgusPG · March 20, 2023, 11:26am

I really encourage to include extra layers such as retries with backoff, fallback strategies and so on (if you’re not doing it yet). This technology is new and OpenAI’s engineering team is doing an AMAZING job in scaling their servers up to cover all the increasing demand that they’re getting. But this is still expected. Things can fail, and we (developers) are the responsible ones for coming up with sound strategies when they do.

tanner49 · March 22, 2023, 6:36pm

Still happening a bit on 3/22, especially during US work hours. For sure build in some safguards into your code while the poor IT and Devops folks at openAI experience the biggest and fastest scaling challenge anyone has every faced.

jazzg · March 22, 2023, 7:11pm

I’m using the @backoff.on_exception(backoff.expo, openai.error.RateLimitError) from backoff library, but today I still see APIConnectionError and timeouts. Can you suggest how to account for these errors so that my loop of requests does not break?

tanner49 · March 22, 2023, 7:19pm

You probably just need to set up a Try/ Except statement to handle all errors.

In the except clause you can either pass out some sort of placholder (e.g., None) or else re-queue the content for later.

jazzg · March 22, 2023, 8:25pm

Thanks, so i tried this:

for i in rlist: 
    try: 
        #mycode
    except TimeoutError:
        print("error")
        continue

But the loop still breaks. Is TimeoutError correct here?

moop · March 22, 2023, 9:31pm

I’m based in EU and have run into these aborted connections a lot over the last week. Its definitely correlated to US working hours…

I’d happily pay more (2-4x token rate) for a more reliable endpoint, at this stage but I don’t see that option. I hope that the team can figure this out soon, and in the meantime, I’ll be implementing various retry mechanisms like @AgusPG recommended above.

AgusPG · March 22, 2023, 9:52pm

Following up on this topic, in case it helps. There’s an API that lets you do a quick health check of every OpenAI model, so you can make your requests strategy depend on it. It’s still pretty easy to implement a health check service such as this one, doing dumb api calls from time to time. But in case you wanna try it out folks, you can check it here.

curt.kennedy · March 22, 2023, 10:05pm

@AgusPG

I like the Try Model X → Try Model Y → Try Model Z → Retry Later

Is there a benefit to Ping Model X, Y, Z → Try model Y if X down, model Z if Y down, etc.

My only guess is you could achieve lower overall latencies if you know ahead of time, is this the only benefit?

AgusPG · March 22, 2023, 10:14pm

Yep, that’s pretty much it. Say that you have a client timeout of 30s per model. Models X and Y are down. It takes you 1 minute to get to model Z and get a completion out of it. This is killer for conversational interfaces, where the user will just run away if they don’t have their answer quickly .

Pinging the models in advance and having a logbook of the health of each model prevents you from continuously trying to get completions of models that are having an outage. So you go straight for model Z (and only retry on it) if you suspect that models X and Y are having an outage.

This improves the UX, in my view

andrea.sottana2 · March 23, 2023, 11:55am

Just adding a “solution” I’ve found. I tried to capture different specific errors but I found that there are so many different errors the platform can throw (such as timeout, remote disconnection, bad gateway just to mention a few) that it’s best to do a blank except statement for now (although not ideal). I’ve found this to work quite well for me

inference_not_done = True
for sample in samples:
    while inference_not_done:
        try:
            response = openai.Completion.create(...)
            inference_not_done = False
        except Exception as e:
            print(f"Waiting 10 minutes")
            print(f"Error was: {e}")
            time.sleep(600)

AgusPG · March 23, 2023, 12:16pm

I do not agree with catching generic exceptions. It’s a bad practice. Also: you do not want to handle all your exceptions in the same way. There are some error where it’s worth retrying, some others where it’s worth falling back, and some others that you should never retry.

You can customize your app to handle exceptions on status codes, instead (such as “retry with this specific payload for all 5xx errors”). For instance, in Python aiohttp_retry does a pretty decent job here. It’s the one that I’m currently using.

Hope it helps!

Topic		Replies	Views
Continuous gpt3 api 500 error: The server had an error while processing your request. Sorry about that! API	60	26748	December 2, 2023
Openai Api Error "The server had an error while processing your request. Sorry about that" API	74	36447	November 25, 2023
Error: 429 Too Many Requests API	56	13777	December 2, 2023
RateLimitError: The server had an error with no reason given API	58	15418	December 2, 2023
What is wrong? HTTPSConnectionPool (host='api.openai.com', port=443) API	19	31372	December 12, 2023

API aborts my connection without a reason - anything I can do?

Related topics