I have updated my code from gpt-3.5-turbo-0613 to gpt-3.5-turbo-1106, but the code seems to hang for no reason. When I first tested it, it ran without issue and actually worked out faster that the older model.
But if I come the next day, the code hangs for minutes . . . . I am afraid this is causing us to loose clients, it looks like our app is not working — and I do not want to revert back to the old model.
This is a new problem with the new model, perhaps it hangs looking for a cache — not sure because when I manage to get one request through, everything works well after that with no ‘hanging’, the calls are even faster . . . . but that initial hanging is a problem, I would rather have a slower model that works 95% of the time than a faster model that works 50% of the time.
Hi and welcome to the Developer Forum!
Are you using any kind of VPN or Proxy? What server kind of hosting infrastructure are you using? Is it on Azure, AWS, Google, Commercial VPS, home internet? Can you post a code snippet of the API calling code along with any setup it relies on, please?
I’m having the same issue. I’m doing a simple for loop that calls gpt-3.5 for a simple translation task (with a very short text). Tried with different versions of gpt-3.5 and got the same problem with all versions. It just hangs at some iterations. And I’m using a quite longe sleep time of 2s…
For the users who end up here with the same problem. I was able to solve the issue with a timeout. This is my call:
@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(10))
def completion_with_backoff(messages,model = "gpt-4-1106-preview"):
completion = client.chat.completions.create(
temperature = 0.7,
(openai version 1.1.1)
This is not a very good solution because it means that a request is being made, for some reason it hangs, the timeout is called, and only with the second request I have a response. Not sure if I’m being charged by the first request…
Also note that I define a short timeout of 5 seconds because I’m working with very short tasks that should give short completion.
This is a good solution, but I think I saw somewhere in the documentation where you can specify the timeout and max-retries in the Client() - when you initiate it.
client = OpenAI(--specify retry and timeout behaviour here----)
I just cant remember where I saw it now.
A client side retry for appearently server-side problem is just a walkaround, should not be considered a long time solution.
We’re also seeing very high rate of time outs, and because we’re quite time sensitive, we aren’t able to upgrade to the new model.
I’m also having the same issue here…
The server infrastructure is under heavy load with the all of the new users joining, hence the pause placed on new Plus memberships, this should get better over the coming days. Most of the lack of responses and hangs at the moment are related to this.
I switched from gpt-3.5-turbo-1106 to gpt-4-1106-preview and that helped quite a bit with the hanging, but at 10X the price, I hope they resolve this issue soon so I can switch back to the cheaper model.
This is my temporary solution as well. Most of gpt3.5-turbl 1106 timeout errors happen in the afternoon (12PM to 4PM), so I have switched to gpt4 turbo during that time.
Anyway, except during the afternoon, gpt3.5 turbo’s response time is faster compared to that of gpt3.5 turbo 0613
Hopefully this gets resolved before they shutdown gpt-3.5-turbo-0613.
We cannot use gpt4 turbo as it’s still too slow for our use-case (compared to gpt3.5).
@bertha.kgokong please remove the “solved” tag, because the provided solution is really not a solution.
I know I have suggested many solutions, but if anyone lands here again. I have found work around with a timeout function, if the GPT3.5 call hangs for over 20sec, I call GPT4. So far I have been able to catch the hanged calls and at least respond within 20sec - then mix between the cheaper and faster GPT3.5 turbo and more expensive GPT4. This way I can even count how many times a day the function hangs.
Here is my workaround . . .
from func_timeout import func_timeout, FunctionTimedOut
def run_with_timeout(messages, chat_functions):
return func_timeout(20, returnFunctionChatGPTCall, args=(messages, chat_functions))
return returnFunctionChatGPT4Call(messages, chat_functions)
This first function calls GPT3.5 Turbo model, if it times out, I then call the second function - returnFunctionChatGPT4Call. Of course it could be the same function and you just pass in a different model – you can build this how you like.
GPT4 does not hang, but I cannot call it all the time due to its 10X price difference - so our default is 3.5, except when it hangs.
Why not send to the normal gpt-3.5-turbo then or 16k if needed?
If the API is failing 1 in 10 times (as has been typical for
gpt-3.5-turbo-1106 if not even worse) you can dispatch the answer to two parallel calls of the model at the same time and still have even lower expense, especially if streaming and closing the connection on the one that is not first to respond.