Variable Response Times in Concurrent API Calls with OpenAI's ChatCompletion API

sukeshan · October 3, 2024, 10:53am

I’m experiencing variable response times when making concurrent API calls to OpenAI’s ChatCompletion API using Python’s ThreadPoolExecutor. While I have implemented parallel requests for 3 different prompts, the execution times for the 3 requests vary significantly, with some taking much longer than others, despite all being triggered simultaneously.

For example, the response times are:

2.58 seconds
4.44 seconds
12.00 seconds

Environment

Model : GPT-4o-mini
Tier : 1
Avg Token Size : I/P = 5000 , O/p = ~300

I’m looking for insights into optimizing these calls for more consistent response times and any strategies for effectively managing concurrent requests to the API.

newnewave · February 6, 2025, 3:23pm

See the same problem.

15 concurrent requests
to gpt-4o-mini
7000 input each
output 200 each
results in vastly different times: 8s, 13s, 29s.

Topic		Replies	Views
Intermittent Latency Spikes with Chat Completion API (GPT-4) in FastAPI Application API	0	183	October 28, 2024
Inconsistent Response Speed with GPT-4.0 Mini Completion API Bugs gpt-4	1	82	July 29, 2025
Latency inconsistencies with gpt-4.1-mini responses API gpt-4 , api	0	43	August 22, 2025
Response times of GPT3.5 models API	3	497	November 24, 2023
API (gpt 3.5 turbo) calls taking variable time (ranging from 2-70 sec) , on similar input length API gpt-35-turbo , api	1	669	October 17, 2023

Variable Response Times in Concurrent API Calls with OpenAI's ChatCompletion API

Related topics