Gpt-3.5-turbo-1106 model hangs for 10 minutes every couple of requests


I am using the gpt-3.5-turbo-1106 and I am doing a lot of request back to back, I noticed that every 10 - 15 requests the model hangs for sometimes more than 10 minutes to complete the on going request. This was not the case last week.

1 Like

Yeah, it not only hangs it also re-generates the messages when used over the assistant API. Check my bug report and see if that affects you as well.

yep, gpt 3 turbo 1106 variant are quite unstable, we actively getting timeout, to work around, you might want to reduce your timeout to 30 seconds, the library default timeout is 10 min.

Yes Exactly. But how can I change the timeout value for the api?

If you are using node library,


(can’t paste link, you need to modify the above url.)

Else, you might have to check the library you are using.

Also bumped into this and took me a while to figure out it’s just OpenAI doing this.
I have a benchmarking script that will do 100 requests to gpt-3.5-turbo-1106 and calculate how many fail. What is interesting is that it seems that I can only reproduce this with API key on OpenAI account that is also used by a few other clients (in our playground) and it seems like this happens more often if I use playground and do requests from my computer at the same time. Looks like there is some throttling/bug happening when you do requests from multiple places at the same time - easily 5% of the requests fail when i do this.

If I switch to a different API key (from my own personal openAI account), the benchmarking script will consistently succeed with 100% success.

BTW from my observation it hangs for much more than 10 minutes - I waited ±1 hour and it never returned any data. 10 minutes is just the default timeout in node-openai :slight_smile:

That was my main reason to switch to the thread / assistant api. There you can offload the requests to the assistant and reap the response message later. I was hoping that it would reduce the time for our script. And yes, it does that, but with the 3.5 model there is a flaw that it generates more messages and eats into the tokens.

1 Like

I tried setting the timeout value with the Node API (both through the per-request options arg, and the global OpenAI arg), but I still run into cases where my request to open AI just hangs. I’m calling it very simply:

chatCompletion = await, {
          timeout: MAX_API_REQUEST_TIMEOUT_MS,
          maxRetries: 1,

I even tried instrumenting my own timeout wrapper around the api, but it still hangs. Is something busy waiting in the open AI library? My custom timeout code (the last log line is never reached):{}, "right before");
    await Promise.race([
      (async () => {
        chatCompletion = await, {
          timeout: MAX_API_REQUEST_TIMEOUT_MS,
          maxRetries: 0,
      new Promise<void>((resolve) => {
        setTimeout(() => resolve(), MAX_API_REQUEST_TIMEOUT_MS);
    ]);{}, "right after");

This seem to happen every couple of requests.

This has been picked up by OpenAI and is being looked at, hopefully with a resolution soon.


on further experimentation, i think i have a slightly different issue. my prompt deterministically causes the open AI node library to hang indefinitely - that is why I theorized something was busy waiting in the post above. I can’t share it publicly since it’s proprietary software unfortunately. Can someone from OpenAI help me debug this?

fixed as of 11/28, didn’t even have to update the npm package

Yeah, in the last days the performance of the models seem to be better. Noticed the same for GPT-4. Guess they were ramping up their CPU power on MS cloud.