Has anyone else noticed significantly more API errors depending on the time of day? (GPT-4)

I tried the app engine for one of my use cases and unfortunately I needed streaming and SSE’s, Non of the App engine variants allow server side events so… was a no no for me.

That sounds about right. Google products, easy to set up, hair-pulling teeth gnashing difficult or impossible to customize.

My outdated urllib3 package was 100% the issue it turns out. Let me summarize for anyone else that stumbles across this.

Basic Details

I’m using the GPT-4 API with Python/Flask, deployed through Google App Engine.

Initial Issue
I was having significantly more 500 server errors from one day to the next and thought it had something to do with an increase in OpenAI server traffic depending on the time of day. @Foxabilo confirmed that while it is somewhat slower during certain times, it’s faster overall and shouldn’t be the source of the problem.

Investigation

  • I confirmed that my app did not error while testing locally through Flask’s default server
  • I introduced retry error handling. This worked to catch the rare 502 errors, but not the 500 server errors
  • I confirmed that my completions were not crossing the 600 second default timeout time set by OpenAI
  • I confirmed that it wasn’t GAE timing out, as the error would have been a DeadlineExceededError
  • I started digging into urllib3, which led me to the fix

The Fix
The package urllib3, which auto-installs with openai, was version 1.26.16. The current version is 2.0.3. After updating to the current version, the 500 server errors have disappeared.

I’m still not sure why this issue popped up from one day to the next when I had been using the old urllib3 version for quite some time, but hey, a fix is a fix.

Thanks for your help @Foxabilo!

1 Like

Hi, is your problem completely solved? I am still getting the 502 bad gateway especially with very long prompts and completions. That is even after updating urllib3.

The 502’s never went away completely, only the 500’s after updating urllib3. For the 502’s, I broke up some of my larger prompts into multiple sections, then instituted error handling to retry a section if it 502’s so it doesn’t have to start from the beginning.

I’m still not sure why 502’s seemed to get more common a few days ago, but this is what worked for me in the meantime. Hope that helps!

2 Likes

Thanks! Appreciate your reply.

For my use case, I don’t think I can split up the prompts. But interesting idea to retry a section after a failure. It may consume few X’s more prompt tokens but definitely better than simply retrying. Guess I have to implement streaming as well.

Are the Azure endpoints any better if anyone knows?

I’d recommend checking if you are getting billed for those unsuccessful requests. I’ve been struggling a lot with the API returning 500, 502, 520, 524 or 429 errors, and at least in the past I was billed for requests that timed out on the OpenAI side (either on their servers or between their cloudflare proxies and their servers).

In my experience, request size had a large impact on whether the API worked or not. Most people around here issue small requests and won’t run into trouble. In my case, GPT-4 with requests above 3000 tokens was beginning to be problematic, anything above 3400 basically never returned. And retrying gets expensive real quick if you’re getting billed for those.

1 Like