Increased latency on GPT-4.1 Mini over the past few days

Hi everyone,

I work with an application where latency is a critical factor. In our setup, we use a large fixed context and make use of cache tokens to reduce latency in our requests — each of which includes prompts of around 110k tokens.

When GPT-4.1 Mini was released, I ran several tests and got very promising results — average latency was around 4 seconds, with the 90th percentile reaching 7 seconds, which was acceptable for our use case.

However, over the past few days, I’ve noticed a significant degradation in performance. In my recent tests, the average latency has increased to 7 seconds, and the 90th percentile is now hitting 12 seconds.

Just to clarify, we are strictly respecting our organization’s rate limits, so I’m confident that this is not the issue.

I haven’t seen any official communication from OpenAI regarding API instability specifically for GPT-4.1 Mini, and I also couldn’t find other users reporting similar issues recently.

Has anyone else experienced this latency increase? And if anyone from OpenAI is reading this — is there any insight you can share about what might be causing this?

Thanks in advance!

3 Likes

For non-cacheable, 1800 in, 256 out gpt-4.1-mini

April 16, shortly after release:

April 30 (rescaled, note axis values):

The bulk of the histogram has moved from below 2 seconds to above 2 seconds.

— Performance Statistics Report (100 trials per model) —

For 100 trials of gpt-4.1-mini @ 2025-04-30:

Metric Average Minimum Maximum
latency_s 0.634 0.312 5.617
stream_rate 60.2 16.3 98.8
total_rate 52.7 15.7 74.2
total_time_s 5.211 3.451 16.297
response_tokens 256.0 256 256
2 Likes