Hello OpenAI community,
I’m encountering intermittent latency issues with the OpenAI Chat Completion API. I’m using the gpt-4o-mini
model in an asynchronous FastAPI application, and while some responses are quick, others take significantly longer even though my prompt lengths are typically short.
Here’s a summary of the issue:
- API: Using
gpt-4o-mini
model with theAsyncOpenAI
client. - Symptoms: Response times fluctuate between a few seconds and up to 18 seconds, even for short prompts.
- Environment: Running on FastAPI. Network and system resources are stable.
- Usage: Short prompts, mostly direct questions and short messages.
- Observations: Latency variations occur seemingly randomly, without correlation to prompt complexity.
Below is the code I’m using to call the API:
python
Copy code
from openai import AsyncOpenAI
from config.config import Config
from typing import List, Dict
class OpenAIClient:
def __init__(self, api_key: str = Config.API_KEY_OPENAI):
self.client = AsyncOpenAI(api_key=api_key)
async def create_chat_completion(
self,
messages: List[Dict[str, str]],
model: str = "gpt-4o-mini",
) -> str:
try:
completion = await self.client.chat.completions.create(
model=model,
messages=messages,
)
return completion.choices[0].message.content
except Exception as e:
print(f"Error in creating chat completion: {str(e)}")
raise
async def close(self):
await self.client.close()
Has anyone else encountered similar response time fluctuations, especially with asynchronous calls? Could server load or region-related factors affect this, and would switching to a different OpenAI model potentially help? I deploy my serivce on an AWS EC2 instance in Singapore.