Latency inconsistencies with gpt-4.1-mini responses

Hi everyone,

I’ve been testing gpt-4.1-mini in streaming mode and I’m noticing some latency inconsistencies that are starting to affect the user experience.

Here are some examples from my logs (time = total response time, LEN = characters in response):

  • 1.21 s (84 chars)

  • 1.33 s (91 chars)

  • 1.36 s (106 chars)

  • 1.20 s (110 chars)

  • 1.32 s (50 chars)

  • 1.39 s (128 chars)

  • 1.42 s (75 chars)

  • 1.40 s (62 chars)

  • 1.61 s (72 chars)

As you can see, most responses tend to hover around 1.3–1.5 seconds, which already feels a bit higher than expected for a mini model. But at certain hours, latency spikes dramatically (sometimes over 15–40 seconds), which ruins the real-time experience I’m trying to build.

I’m not sure if this is due to server load, HTTP/2 session reuse, or something else, but the inconsistency is very noticeable.

:backhand_index_pointing_right: Has anyone else experienced the same issue with gpt-4.1-mini? Is this expected behavior, or should I consider this an anomaly and open a support ticket?

Thanks in advance!