Chat Completion API extremely slow and hanging

I am trying to use GPT4 turbo (1106 preview) and GPT3.5-1106 and short requests are taking a highly variable and often inordinate amount of time. I am tier 3 with a limit of 50k requests per minute but, sending single sequential requests, I’m averaging a 30s response time and getting max 2 requests per minute.

If I remove the timeout=5 argument, it gets much worse as it can then hang for minutes on a single attempt.

Is everyone experiencing this due to high demand or is it just me?

@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(10))
def make_chat_completion_request(
    prompt: str, model="gpt-3.5-turbo-1106", system_prompt: Optional[str] = None, force_json=True
    system_prompt = system_prompt if system_prompt is not None else DEFAULT_SYSTEM_PROMPT
    if force_json:
        system_prompt = system_prompt + f" {openai_prompts['force_json']}"
    response =
        messages=[{"role": "system", "content": system_prompt}, {"role": "user", "content": prompt}],
        response_format={"type": "json_object"} if force_json else None,
    return response

Yes, I’ve posted about it, receiving no answer.

In my case the API randomly hangs on some prompts. If I kill it and send the exact same prompt, the answer is usually fast. But after some other calls, I consistently get a call taking more than 200 seconds.

Actually, I’ve noticed something that you could experiment with. In my case, GPT4 calls (turbo model) are quite fast with respect to the standard GPT4.

I have your same problem with GPT3.5, but only the model gpt-3.5-turbo-1106. With the old model, gpt-3.5-turbo-0613, I get quick responses.

Do you also get quick responses using the old model?


  • gpt-3.5-turbo-0613 doesn’t seem to have the same problem
    • but it consistently has JSON errors where gpt-turbo-1106 doesn’t because of the new response_format argument.
  • gpt-4-1106-preview (GPT-4 turbo) is working for me and is much faster now (~5s response time).

I don’t think it can be just a load issue because if gpt-3.5-turbo was taking 3+ minutes to return for other people, we’d probably be hearing a lot more about it. Hopefully someone from OpenAI can help debug.

I didn’t notice any gpt-3.5-turbo-0613 problem with JSON errors, but I can believe that. I’ve just adapted to its way of giving output.

Yeah, I also get quick responses from gpt-4. But for me it’s a bit costly, while finetuning a gpt-3.5-turbo-1106 would be better for me, from accuracy and costs point of view.

I hope that finetuned models do not have this problem

Sorry I should have been more clear. I am asking for the output to be JSON that I then parse, but the older models require a ton of prompting to do this, stuff like…

Return a string of valid json that can be loaded with json.loads. Avoid json errors by remembering that valid json uses two backslashes as an escape character, not one...etc

and even then it has a JSONDecodeError around 1 in 10 times. With gpt-3.5-turbo-1106 you can add a response_format argument that includes {"type": "json_object"} and it never fails.

I suspect there is a load issue, they shut down ChatGPT signups after dev day after all. I do experience hangs and having to resend prompts.

@madeupmasters , have you tried function calling to get back valid json? It works ok with gpt-3.5-0613 and if you just need the json response, you don’t need to send it back to the AI for a friendly response like the examples show, you can just use it for your purposes and move on.

Hey! It turns out there was a bug on our end that could result in timeouts in certain scenarios. We have since fixed the issue. Please let us know in a new thread if you end up seeing similar issues again. Thanks again for reporting this!

1 Like