Chat Completion API extremely slow and hanging

madeupmasters · November 24, 2023, 3:16pm

I am trying to use GPT4 turbo (1106 preview) and GPT3.5-1106 and short requests are taking a highly variable and often inordinate amount of time. I am tier 3 with a limit of 50k requests per minute but, sending single sequential requests, I’m averaging a 30s response time and getting max 2 requests per minute.

If I remove the timeout=5 argument, it gets much worse as it can then hang for minutes on a single attempt.

Is everyone experiencing this due to high demand or is it just me?

@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(10))
def make_chat_completion_request(
    prompt: str, model="gpt-3.5-turbo-1106", system_prompt: Optional[str] = None, force_json=True
):
    system_prompt = system_prompt if system_prompt is not None else DEFAULT_SYSTEM_PROMPT
    if force_json:
        system_prompt = system_prompt + f" {openai_prompts['force_json']}"
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "system", "content": system_prompt}, {"role": "user", "content": prompt}],
        response_format={"type": "json_object"} if force_json else None,
        timeout=5,
    )
    return response

sbaldino · November 24, 2023, 4:52pm

Yes, I’ve posted about it, receiving no answer.

In my case the API randomly hangs on some prompts. If I kill it and send the exact same prompt, the answer is usually fast. But after some other calls, I consistently get a call taking more than 200 seconds.

Actually, I’ve noticed something that you could experiment with. In my case, GPT4 calls (turbo model) are quite fast with respect to the standard GPT4.

I have your same problem with GPT3.5, but only the model gpt-3.5-turbo-1106. With the old model, gpt-3.5-turbo-0613, I get quick responses.

Do you also get quick responses using the old model?

madeupmasters · November 24, 2023, 10:45pm

Hey,

gpt-3.5-turbo-0613 doesn’t seem to have the same problem
- but it consistently has JSON errors where gpt-turbo-1106 doesn’t because of the new response_format argument.
gpt-4-1106-preview (GPT-4 turbo) is working for me and is much faster now (~5s response time).

I don’t think it can be just a load issue because if gpt-3.5-turbo was taking 3+ minutes to return for other people, we’d probably be hearing a lot more about it. Hopefully someone from OpenAI can help debug.

sbaldino · November 27, 2023, 8:26am

I didn’t notice any gpt-3.5-turbo-0613 problem with JSON errors, but I can believe that. I’ve just adapted to its way of giving output.

Yeah, I also get quick responses from gpt-4. But for me it’s a bit costly, while finetuning a gpt-3.5-turbo-1106 would be better for me, from accuracy and costs point of view.

I hope that finetuned models do not have this problem

madeupmasters · November 27, 2023, 2:59pm

Sorry I should have been more clear. I am asking for the output to be JSON that I then parse, but the older models require a ton of prompting to do this, stuff like…

Return a string of valid json that can be loaded with json.loads. Avoid json errors by remembering that valid json uses two backslashes as an escape character, not one...etc

and even then it has a JSONDecodeError around 1 in 10 times. With gpt-3.5-turbo-1106 you can add a response_format argument that includes {"type": "json_object"} and it never fails.

shane.brennan · November 27, 2023, 3:08pm

I suspect there is a load issue, they shut down ChatGPT signups after dev day after all. I do experience hangs and having to resend prompts.

@madeupmasters , have you tried function calling to get back valid json? It works ok with gpt-3.5-0613 and if you just need the json response, you don’t need to send it back to the AI for a friendly response like the examples show, you can just use it for your purposes and move on.

logankilpatrick · December 4, 2023, 3:23pm

Hey! It turns out there was a bug on our end that could result in timeouts in certain scenarios. We have since fixed the issue. Please let us know in a new thread if you end up seeing similar issues again. Thanks again for reporting this!

Topic		Replies	Views
New gpt-3.5-turbo-1106 model constantly times out, anyone else? API	20	3185	December 4, 2023
GPT 3.5-Turbo API call randomly hangs indefinitely API	10	3945	July 18, 2024
GPT3.5 Turbo 1106 Just Hangs API	14	2302	December 4, 2023
Completion Endpoint Randomly Freezes API	10	7247	October 26, 2023
Slow Chat api responses ------ API	17	6442	December 24, 2023

Chat Completion API extremely slow and hanging

Related topics