Latency inconsistencies with gpt-4.1-mini responses

strike01 · August 22, 2025, 5:54pm

Hi everyone,

I’ve been testing gpt-4.1-mini in streaming mode and I’m noticing some latency inconsistencies that are starting to affect the user experience.

Here are some examples from my logs (time = total response time, LEN = characters in response):

1.21 s (84 chars)
1.33 s (91 chars)
1.36 s (106 chars)
1.20 s (110 chars)
1.32 s (50 chars)
1.39 s (128 chars)
1.42 s (75 chars)
1.40 s (62 chars)
1.61 s (72 chars)

As you can see, most responses tend to hover around 1.3–1.5 seconds, which already feels a bit higher than expected for a mini model. But at certain hours, latency spikes dramatically (sometimes over 15–40 seconds), which ruins the real-time experience I’m trying to build.

I’m not sure if this is due to server load, HTTP/2 session reuse, or something else, but the inconsistency is very noticeable.

Has anyone else experienced the same issue with gpt-4.1-mini? Is this expected behavior, or should I consider this an anomaly and open a support ticket?

Thanks in advance!

Topic		Replies	Views
Increased latency on GPT-4.1 Mini over the past few days Feedback api , gpt-41	1	1010	April 30, 2025
Intermittent Latency Spikes with Chat Completion API (GPT-4) in FastAPI Application API	0	183	October 28, 2024
Inconsistent Response Speed with GPT-4.0 Mini Completion API Bugs gpt-4	1	81	July 29, 2025
Inconsistent GPT-4.1 Mini API Performance Over Time API api	0	143	July 23, 2025
GPT-4o Inconsistent Response Times Feedback api	2	844	July 16, 2024

Latency inconsistencies with gpt-4.1-mini responses

Related topics