jazzg
15
Thanks, so i tried this:
for i in rlist:
try:
#mycode
except TimeoutError:
print("error")
continue
But the loop still breaks. Is TimeoutError correct here?
moop
16
I’m based in EU and have run into these aborted connections a lot over the last week. Its definitely correlated to US working hours…
I’d happily pay more (2-4x token rate) for a more reliable endpoint, at this stage but I don’t see that option. I hope that the team can figure this out soon, and in the meantime, I’ll be implementing various retry mechanisms like @AgusPG recommended above.
1 Like
AgusPG
17
Following up on this topic, in case it helps. There’s an API that lets you do a quick health check of every OpenAI model, so you can make your requests strategy depend on it. It’s still pretty easy to implement a health check service such as this one, doing dumb api calls from time to time. But in case you wanna try it out folks, you can check it here.
1 Like
@AgusPG
I like the Try Model X → Try Model Y → Try Model Z → Retry Later
Is there a benefit to Ping Model X, Y, Z → Try model Y if X down, model Z if Y down, etc.
My only guess is you could achieve lower overall latencies if you know ahead of time, is this the only benefit?
AgusPG
19
Yep, that’s pretty much it. Say that you have a client timeout of 30s per model. Models X and Y are down. It takes you 1 minute to get to model Z and get a completion out of it. This is killer for conversational interfaces, where the user will just run away if they don’t have their answer quickly
.
Pinging the models in advance and having a logbook of the health of each model prevents you from continuously trying to get completions of models that are having an outage. So you go straight for model Z (and only retry on it) if you suspect that models X and Y are having an outage.
This improves the UX, in my view 
2 Likes
Just adding a “solution” I’ve found. I tried to capture different specific errors but I found that there are so many different errors the platform can throw (such as timeout, remote disconnection, bad gateway just to mention a few) that it’s best to do a blank except statement for now (although not ideal). I’ve found this to work quite well for me
inference_not_done = True
for sample in samples:
while inference_not_done:
try:
response = openai.Completion.create(...)
inference_not_done = False
except Exception as e:
print(f"Waiting 10 minutes")
print(f"Error was: {e}")
time.sleep(600)
1 Like
AgusPG
21
I do not agree with catching generic exceptions. It’s a bad practice. Also: you do not want to handle all your exceptions in the same way. There are some error where it’s worth retrying, some others where it’s worth falling back, and some others that you should never retry.
You can customize your app to handle exceptions on status codes, instead (such as “retry with this specific payload for all 5xx errors”). For instance, in Python aiohttp_retry does a pretty decent job here. It’s the one that I’m currently using.
Hope it helps!
2 Likes
zaisbd
22
I am using ‘train_3_motive_en.csv’ model to generate some texts since yesterday. Yesterday it ran well. However, I am getting this error for the last 7 hours:
openai.error.APIConnectionError: Error communicating with OpenAI: (‘Connection aborted.’, RemoteDisconnected(‘Remote end closed connection without response’))
Any idea? what is going on
Best Regards,
Zahurul
I’m using the chat completion endpoint, currently with the gpt-3.5-turbo. Before yesterday, for a few weeks now, I’ve been running dozens of requests every hours without any issues. Yesterday, about half of them started failing with the same APIConnectionError that’s been reported here. Today, most of them, around 80% of the requests are failing with that error.
Shouldn’t the status.openai.com page reflect that issue on the API?
1 Like
Hi @AgusPG
I agree with you, catching generic expression is a bad engineering practice and it can have pretty bad consequences if done in software developing. My solution was more suitable for an NLP researcher looking for a quick fix and who just wants the output results to analyse offline. Being a researcher but with a previous developer background I see what you mean, but I’ve also seen a lot “worse” in research code than just catching a generic expression, in order to get things to work in the short term. Definitely not advisable for a scalable application, and if you have a live app with real users then my solution is not for you. And thanks for pointing out aiohttp_retry, I’ll look into it 
2 Likes
I’m also still getting tons of connection errors, timeouts and 502s, even after yesterday’s fix. Backoff helps, but my requests often retry 3+ times before my serverless functions time out…
AgusPG
26
Oh yeah, absolutely. If you can work offline and do not need real-time, I agree that you can be more flexible as regards the software development part of your app
.
moop
27
One observation I’m curious if any of you have witnessed…
Context: I’m using the text completion API (not chat) and my application built to iterate through various text, calling the API each time.
Observation: When I run this app/code it will work for the first 4-9 API calls, executing each in < 1 s, and then subsequent API calls will either be extremely slow (>90s) or fail with the exception in this thread.
Has anyone else seen this behavior? It seems like there is some unofficial throttling going on.
1 Like
This could be true now you mention it
We run a chained query of 10 prompts. We often get an issue near the end of the chain. We catch it and retry and it continues after a short delay
Hadn’t considered it up to now
2 Likes
I am receiving this issue as well. Worse still, despite the API closing my connection without response I have been charged regardless!!
I’m using GPT-4 and getting close to the maximum token limit, which I believe has something to do with it. When I just run the test code that the API documentation suggests it runs flawlessly. Very strange!
I’m now getting this error
The server had an error while processing your request. Sorry about that!
Looks like on different days you get different errors
1 Like
I’m also running into a similar issue. It seems that the connection remains open for some time in between the requests instead of being closed. (This is just a guess, I still have to yet to audit the system to confirm). It’s very reproducible with my project, and it seems to happen when I let the program idle for a few minutes. Initializing the first request hasn’t been an issue. Continuing the conversation after a pause has thrown me an error.
I could have sworn there was some verbiage I read somewhere about terminating the connection manually, though I can’t find it. Is anyone familiar with what I’m talking about?
To add to this, if I change the parameters of the request to completely remove any past context to the history of the chat, then I don’t get the error anymore. My requests range anywhere from 50 tokens to 4000
Receiving the same error but on the Moderation endpoint. Super intermittent. Just leaving this here so no one else goes crazy trying to find out what’s wrong with their code.
Error communicating with OpenAI: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
I encounter the same issue when my input is too long, i think it many be caused by the internet problem. Mybe we should change our VPN if used.
I started getting this, this morning. I added a try/except block inside of a retry loop, but it doesn’t seem to help.