This is a weeks worth of response times for 256 tokens, so to answer your original question (sort of) there is time of day response variation due to load.
Yes that would be the timeframe, only issue is it’s still ongoing.
To answer your earlier question, I’m hosting through Gcloud using their Redis memorystore feature to communicate. This issue is a little confusing because everything works fine when I finish up work for the day, then the next morning it’s intermittent or constant issues for a while even though I’ve changed nothing having to do with server communication.
I’ve implemented retry error handling for the 502 errors and it seems to be working, it’s just this 500 error that is still happening every ~1 of 10 completions and I can’t get it to catch the exception yet.
Edit: @_j If I’m getting a 502 or 504, do you think it’s OpenAI server side 100% of the time? I’ve read that there still may be a problem with the code itself even though the error shows differently.
I’m thinking that there might be some maximum no reply timeout on the GCloud side, that would tend to make what is effectively a small variation in performance seem like a binary change.
Not a GC aficionado, so I’m not sure if that timeout is a configurable value.
If we look at the past 24 hours then you can see an increase, but for the month as a whole, it’s actually quicker by about 10%
It’ll be an OpenAI issue if there is only one CF hop to OpenAI, if CF are using other services to route traffic then it could be on their end.
That’s an interesting thought, I’ll do some digging today and see what I can find in App Engine
Ok, so I got into GAE’s error reporting. 502 errors are very rare, these are 99% of them:
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=600)
Here is a stack trace of one of them:
Traceback (most recent call last):
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/urllib3/connectionpool.py", line 798, in urlopen
retries = retries.increment(
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/urllib3/packages/six.py", line 770, in reraise
raise value
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/urllib3/connectionpool.py", line 714, in urlopen
httplib_response = self._make_request(
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/urllib3/connectionpool.py", line 468, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/urllib3/connectionpool.py", line 357, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=600)
After some further reading, it seems I should be getting something close to this error if it is GAE timing out, which I haven’t seen:
< class 'google.appengine.runtime.DeadlineExceededError' >:
Now here’s the odd thing; even though the error says it’s coming from OpenAI’s side, when I remove GAE from the equation and run my app locally, the errors disappear. It’s also odd to me that my error handling is built specifically to catch and retry 502’s and ReadTimeoutErrors, but it only works for the 502’s.
Also, as far as I know, there should only be one CF hop to OpenAI.
This may be a longshot, but could it have to do with the urllib3 package itself? It’s very, very outdated (mine is v.1.26.16, current version is 2.0.4), but I’m unable to make it current because then it conflicts with a google-oauth package I need to use.
What happens if you put v1.26.16 on your local environment? Does that start to get the ReadTimeOuts?
It doesn’t, I’ve been using the same version for local and GAE
Edit to add it’s the same urllib3 version that auto installs when I install the openai package
Thanks for the link, I’ll keep that as a backup in case all else fails.
I tried instituting a 5 second time delay between completion requests as I read about in this thread, but that didn’t work either.
This seems to be a problem with GPT-4. I rolled my app back to 3.5-turbo and it works perfectly (at about 25% of the time it takes GPT-4 also). Frustrating to say the least.
Can you force a change of location for your App Engine? Just wonder if there is some issue with that particular node.
Do you mean the regional location or something else? I’m definitely not a GC aficionado either
Oh and I forgot to add: I started logging the completion times and the ones that error are nowhere close to OpenAI’s default 600 second timeout. The longest I saw was 190 seconds, so it technically shouldn’t be triggering this error in the first place.
Yea, like with Azure, I can pick a location for my apps, EU, US, etc., I wondered if something similar existed for GC.
Unfortunately no, whatever location I pick during the GAE setup for that project is permanent.
Edit: I updated urllib3 from v1.26.16 to v2.0.3 and the issue has gotten much better (from every ~1 of 10 completions erroring to every ~1 of 30 or so). This is still strange given the fact that the outdated version auto-installs when I installed the openai package, but I’ll take it.
I discovered something else also. When I’m testing locally, it’s running through whatever the standard Flask/Python environment is. But when the app is deployed, Google App Engine uses a Gunicorn WSGI Webserver.
I’m thinking this has something to do with it because A). it’s one of the only differences between testing locally and using GAE and B). every time the error occurs, I see 8-10 of these in my debug log:
[2023-09-05 12:12:06 +0000] [24] [INFO] Worker exiting (pid: 24)
Super interesting, thankyou for sharing your findings, if you ever crack it completely I’d love to hear about it.
I tried the app engine for one of my use cases and unfortunately I needed streaming and SSE’s, Non of the App engine variants allow server side events so… was a no no for me.
I tried the app engine for one of my use cases and unfortunately I needed streaming and SSE’s, Non of the App engine variants allow server side events so… was a no no for me.
That sounds about right. Google products, easy to set up, hair-pulling teeth gnashing difficult or impossible to customize.
My outdated urllib3 package was 100% the issue it turns out. Let me summarize for anyone else that stumbles across this.
Basic Details
I’m using the GPT-4 API with Python/Flask, deployed through Google App Engine.
Initial Issue
I was having significantly more 500 server errors from one day to the next and thought it had something to do with an increase in OpenAI server traffic depending on the time of day. @Foxabilo confirmed that while it is somewhat slower during certain times, it’s faster overall and shouldn’t be the source of the problem.
Investigation
- I confirmed that my app did not error while testing locally through Flask’s default server
- I introduced retry error handling. This worked to catch the rare 502 errors, but not the 500 server errors
- I confirmed that my completions were not crossing the 600 second default timeout time set by OpenAI
- I confirmed that it wasn’t GAE timing out, as the error would have been a
DeadlineExceededError
- I started digging into urllib3, which led me to the fix
The Fix
The package urllib3, which auto-installs with openai, was version 1.26.16. The current version is 2.0.3. After updating to the current version, the 500 server errors have disappeared.
I’m still not sure why this issue popped up from one day to the next when I had been using the old urllib3 version for quite some time, but hey, a fix is a fix.
Thanks for your help @Foxabilo!
Hi, is your problem completely solved? I am still getting the 502 bad gateway especially with very long prompts and completions. That is even after updating urllib3.
The 502’s never went away completely, only the 500’s after updating urllib3. For the 502’s, I broke up some of my larger prompts into multiple sections, then instituted error handling to retry a section if it 502’s so it doesn’t have to start from the beginning.
I’m still not sure why 502’s seemed to get more common a few days ago, but this is what worked for me in the meantime. Hope that helps!
Thanks! Appreciate your reply.
For my use case, I don’t think I can split up the prompts. But interesting idea to retry a section after a failure. It may consume few X’s more prompt tokens but definitely better than simply retrying. Guess I have to implement streaming as well.
Are the Azure endpoints any better if anyone knows?
I’d recommend checking if you are getting billed for those unsuccessful requests. I’ve been struggling a lot with the API returning 500, 502, 520, 524 or 429 errors, and at least in the past I was billed for requests that timed out on the OpenAI side (either on their servers or between their cloudflare proxies and their servers).
In my experience, request size had a large impact on whether the API worked or not. Most people around here issue small requests and won’t run into trouble. In my case, GPT-4 with requests above 3000 tokens was beginning to be problematic, anything above 3400 basically never returned. And retrying gets expensive real quick if you’re getting billed for those.