Extremely long request times- Completions API gpt-4o

jacob13 · December 4, 2024, 1:22pm

Top 5% of requests are taking >30s
About our setup
- gpt-4o-2024-08-06
- Nodejs create completions api
- We make multiple chat completion requests in parallel
- Use function calling
- These requests are successful
- Usage is well below our openAI rate limits

Usage
We are tier 5 and well within the rate limits:

Model	RPM	TPM	Batch Queue Limit
`gpt-4o`	10,000	30,000,000	5,000,000,000

Example response (>100s request time):

{
  "id": "chatcmpl-AaOSwrmyRWJmN59j9vg8tyb0SJwZR",
  "object": "chat.completion",
  "created": 1733237218,
  "model": "gpt-4o-2024-08-06",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_ZVL3JKwqrB39HFU4cjr1IsmU",
            "type": "function",
            "function": {
              "name": "createSummary",
              "arguments": "{\"title\":\"The Feuding Families of Verona\",\"summaries\":[{\"subheader\":\"The Montague-Capulet Feud\",\"bulletPoints\":[\"The longstanding feud between the Montagues and Capulets disrupts the peace in Verona.\"]},{\"subheader\":\"The Street Brawl\",\"bulletPoints\":[\"A street fight breaks out between the servants of the Montague and Capulet households.\"]},{\"subheader\":\"The Prince's Decree\",\"bulletPoints\":[\"Prince Escalus declares that further disturbances will be punished by death.\"]},{\"subheader\":\"Romeo's Melancholy\",\"bulletPoints\":[\"Romeo is introduced as a lovesick young man, pining for Rosaline.\"]},{\"subheader\":\"Benvolio's Advice\",\"bulletPoints\":[\"Benvolio advises Romeo to forget Rosaline and look at other women.\"]}]}"
            }
          }
        ],
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 671,
    "completion_tokens": 168,
    "total_tokens": 839,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "system_fingerprint": "fp_7f6be3efb0"
}

Diet · December 4, 2024, 2:10pm

Welcome to the community forums! Good to have you.

Just for clarification,

you’re not using streaming, right?
with latency, you’re not referring to time-to-first-token, but rather just request → response time, is that correct?

This might be pretty concerning, because that might imply that a lot of people using streaming are getting billed for what they would consider timeout errors

jacob13 · December 4, 2024, 3:37pm

Hi thanks for the response

We are not streaming
Yes, request → response time. We are using a proxy monitoring platform and supposedly the time-to-fist-token is 0ms but I’d take that with a grain of salt. Worth saying these issues were happening before we started using the proxy.

Any advice on how we could go about debugging this? Since it seems random, we considered killing the connection after a certain time limit and retrying but that would still lead to long response times.

azy1 · December 4, 2024, 9:42pm

May I ask which platform you are using to monitor this? I have also been observing much higher gpt latencies recently and trying to get to the bottom of it. In fact, I use the python client library, and I notice some calls even seem to not return after two minutes!

jhakulin · December 5, 2024, 12:23am

Same problems here with assistants, starting today. For example, retrieving information from existing threads can take several seconds to tens of seconds, similarly when creating threads etc.

cy1231 · December 5, 2024, 12:26am

I am facing the same exact problem here, the latency times suddenly significantly longer. I have an optimised prompt that takes 1 second, and now it’s suddenly 30 seconds. I am using both gpt-4o and gpt-4o-mini and have the same excessive latency.

BrianLovesAI · December 5, 2024, 12:32am

Same here… the requests took more than 31.28 s, normally less than 5 s. Slow on both stream, non stream. Prod env cannot process the requests from the customers. Outages.

It’s solved for me now. Thanks! I am watching OpenAI Status page.

cy1231 · December 5, 2024, 12:53am

Here’s a minimal reproducible example of the abnormally high latency times. I am also facing similar times when using gpt-4o-mini, and when I converted the code below to langchain

from dotenv import load_dotenv
import os
import openai
import time

messages = [{"role": "user", "content": "What is 2+2?"}]


start_time = time.time()

session = OpenAI().chat.completions.create(
    model='gpt-4o',
    messages=messages,
)
end_time = time.time()
print(f"time taken: {end_time - start_time} seconds")
response_message = session.choices[0].message.content
print("Response:", response_message)

#time taken: 31.05855894088745 seconds
#Response: 2 + 2 equals 4.

anon10827405 · December 5, 2024, 1:20am

Where are you located? Using proxy? Are you making the call from a data center?

minchan159 · December 5, 2024, 1:24am

let me share my case.

I have 2 api key
one is very long : ‘sk- proj-***********************************************************************************************************************************’
another is shorts : ‘sk- ***********************************’

they show different speed
long key is faster.
short key is slow.

how…

import { ChatOpenAI } from "langchain/chat_models/openai";
import { StringOutputParser } from "langchain/schema/output_parser";

let key = "sk-...."

const parser = new StringOutputParser();
  const model = new ChatOpenAI({
    modelName: "gpt-4o",
    temperature: 0.6,
    openAIApiKey: key,
  });


  const res = await model.pipe(parser).invoke([systemPrompt, humanPrompt]);
  return res;

It’s solved now!

jacob13 · December 5, 2024, 9:19am

We are using AWS lambdas to make the requests (Ireland based). The proxy uses cloudflare workers so should be quick but it happens even without the proxy.

long key is faster.

Tried re-rolling keys and no luck.

Topic		Replies	Views
Slow Chat api responses ------ API	17	6500	December 24, 2023
API calls to davinci text 3 very slow and random speeds for identical prompts API	27	6972	December 25, 2023
Completions API Suddenly slow API gpt4o	4	662	October 15, 2024
GPT-3.5 API is very slow. Any fix? API	31	9943	October 12, 2023
Chat GPT's API is significantly slower than the website with GPT Plus API	35	36908	December 12, 2023

Extremely long request times- Completions API gpt-4o

Related topics