Intermittent Latency Spikes with Chat Completion API (GPT-4) in FastAPI Application

hungpm.hust · October 28, 2024, 4:39am

Hello OpenAI community,

I’m encountering intermittent latency issues with the OpenAI Chat Completion API. I’m using the gpt-4o-mini model in an asynchronous FastAPI application, and while some responses are quick, others take significantly longer even though my prompt lengths are typically short.

Here’s a summary of the issue:

API: Using gpt-4o-mini model with the AsyncOpenAI client.
Symptoms: Response times fluctuate between a few seconds and up to 18 seconds, even for short prompts.
Environment: Running on FastAPI. Network and system resources are stable.
Usage: Short prompts, mostly direct questions and short messages.
Observations: Latency variations occur seemingly randomly, without correlation to prompt complexity.

Below is the code I’m using to call the API:

python

Copy code

from openai import AsyncOpenAI
from config.config import Config
from typing import List, Dict

class OpenAIClient:
    def __init__(self, api_key: str = Config.API_KEY_OPENAI):
        self.client = AsyncOpenAI(api_key=api_key)

    async def create_chat_completion(
        self,
        messages: List[Dict[str, str]],
        model: str = "gpt-4o-mini",
    ) -> str:
        try:
            completion = await self.client.chat.completions.create(
                model=model,
                messages=messages,
            )
            return completion.choices[0].message.content
        except Exception as e:
            print(f"Error in creating chat completion: {str(e)}")
            raise

    async def close(self):
        await self.client.close()

Has anyone else encountered similar response time fluctuations, especially with asynchronous calls? Could server load or region-related factors affect this, and would switching to a different OpenAI model potentially help? I deploy my serivce on an AWS EC2 instance in Singapore.

Topic		Replies	Views
Latency inconsistencies with gpt-4.1-mini responses API gpt-4 , api	0	43	August 22, 2025
Variable Response Times in Concurrent API Calls with OpenAI's ChatCompletion API API gpt-4o-mini	1	155	February 6, 2025
Extremely long request times- Completions API gpt-4o Bugs gpt-4o	10	548	December 5, 2024
Chat Completion API extremely slow and hanging API	7	5338	December 4, 2023
Inconsistent Response Speed with GPT-4.0 Mini Completion API Bugs gpt-4	1	81	July 29, 2025

Intermittent Latency Spikes with Chat Completion API (GPT-4) in FastAPI Application

Related topics