Just got access to ChatGPT API 4, we’ve been using 3.5-turbo until now. Today we switched and it’s nothing but 502 Bad Gateway errors. We managed to get maybe one good response in a whole afternoon. We switch back to 3.5-turbo and everything is fine again.
Browsing through the forums, seems this is not an isolated incident.
So being that is pretty prevalent, what can we do to get around this issue?
Right now the timeout is set to 300 seconds from what I can tell and we get the error around that time. Can we change the time out for the call? Is that possible?
And yes, we are handling it within a try / except block, waiting and retrying but nothing seems to work.
502 is a gateway error, i.e. a server acting as a relay or a proxy failed in some way, I’m assuming the exact issue is a=some variant of a timeout, this is not caused by OpenAI, it’s a failure in-between the message being sent from OpenAI and it landing in your network socket.
What I would do is perform some basic sanity checks. Try a super short prompt, i.e. “test” and wait for the reply, if that comes back reliably then try increasing the prompt length. I don’t know what your network environment looks like so it’s hard to tell exactly what’s going on. Do you have command line access? could you just run a python shell and try it from there?
We use Google Colab to run the python scripts. Meaning the scripts run on Google servers and seem to work for everything else (including other versions of gpt like gpt-3.5-turbo) except for gpt-4.
I saw a similar message in a previous thread stating the same that this is not an issue with OpenAI but I think that if the API is struggling to return an answer we would be getting a timeout or similar and thus the error, right?
I mean if this were to work fine with a shorter text, what would that mean?
Well, it would mean that there is some timeout occurring in the google colab, lots of open soccket connections have 60 second and 300 second timeouts.
One possible solution is to use streaming, but I have not set that up in a google colab before, and I know the Google App Engine system will not handle server side events, so that might be an issue.
OK, so it seems like the colab environment does not like waiting over 300 second with a connection active, that is a limitation you will have to work within, unless you can get streaming to work.
There may be a way to increase the colab timeout values, if someone else who has been through this knows.
I am facing the same issue, with gpt-3.5-turbo-16k and I am using jupyter notebook on my pc. Whenver the token count goes over around 3000. Any help will be of course very helpful.
I spent all day trying to record a demo of a chat bot but finally gave up because I couldn’t make it through a single chat session without the bot hanging…
If you are streaming, you can set your own timer that resets a short timeout watching your parallel queue. I do this on python threading generators with Qt but haven’t done it with async events or other languages with different resources to say “here’s how you program this in your backend” without hitting a book or bot for answers.
I’m not streaming… Just using the Chat Completion API. I need to parse the JSON output from the model and run it through a JSON Schema Validator so streaming does me no good… And the hang has nothing to do with the prompt length. My prompts are generally under 2000 tokens. 1 out of 10 requests simply hang.
On a non-streamed completion, where you get your whole response at once after waiting, you can set a lower timeout that will reset and try again after your maximum possible generation wait: say 90 seconds is the worst case that still gets you an answer when typical answers are 30 seconds.
However, 90 seconds is a lot of gear spinning when the AI is never going to give you an answer. By using stream=true you know within a few seconds whether you are going to receive tokens or non-response.
I can do my own client side timeout logic to try and work around this issue but that wasn’t really my point in adding on to this thread. There’s something going on server side with OpenAI and they need to be aware of it so they can fix it.
If you see request times for the same basic prompt of:
1.1 seconds
500ms
600ms
500ms
5 minutes
There’s something going on…
They either have a bad cluster or their request router is routing to an offline cluster.
I see your point about being able to more quickly detect an issue using streaming but that does mean opening a web socket which lowers the overall throughput of a node. It’s all about tradeoffs but worth thinking about. I’d prefer they just fix their router issue.