Inconsistent Response Speed with GPT-4.0 Mini Completion API

spazone · July 29, 2025, 10:24am

I am currently using the Completion API with GPT-4.0 Mini and have noticed that the response speed is not consistent.

Sometimes the responses are very fast.
Other times, the responses are noticeably slow.

When I use GPT-4.0, the response speed is consistently stable, so this issue seems specific to GPT-4.0 Mini.

Could you please investigate why GPT-4.0 Mini has variable performance and suggest any potential optimizations or fixes?

Thank you for your assistance!

Environment Details:

API: OpenAI Completion API
Model: GPT-4.0 Mini

merefield · July 29, 2025, 10:27am

Could be because mini is more popular due to its lower price so its infrastructure is under greater load, so the queues are more variable (and therefore the response times)

But I’m all up for getting the verified facts …

You will have to hope the staff chime in on this one though.

Topic		Replies	Views
Latency inconsistencies with gpt-4.1-mini responses API gpt-4 , api	0	238	August 22, 2025
Inconsistent GPT-4.1 Mini API Performance Over Time API api	0	329	July 23, 2025
GPT-4o-mini randomly much slower than GPT-3.5-turbo Bugs gpt-4o-mini	8	1233	November 20, 2024
Gpt-4o-mini is really slow API gpt-4o-mini	7	3781	December 29, 2025
Variable Response Times in Concurrent API Calls with OpenAI's ChatCompletion API API gpt-4o-mini	1	227	February 6, 2025

Inconsistent Response Speed with GPT-4.0 Mini Completion API

Related topics