Increased latency on GPT-4.1 Mini over the past few days

eduardo_stuani · April 30, 2025, 8:20pm

Hi everyone,

I work with an application where latency is a critical factor. In our setup, we use a large fixed context and make use of cache tokens to reduce latency in our requests — each of which includes prompts of around 110k tokens.

When GPT-4.1 Mini was released, I ran several tests and got very promising results — average latency was around 4 seconds, with the 90th percentile reaching 7 seconds, which was acceptable for our use case.

However, over the past few days, I’ve noticed a significant degradation in performance. In my recent tests, the average latency has increased to 7 seconds, and the 90th percentile is now hitting 12 seconds.

Just to clarify, we are strictly respecting our organization’s rate limits, so I’m confident that this is not the issue.

I haven’t seen any official communication from OpenAI regarding API instability specifically for GPT-4.1 Mini, and I also couldn’t find other users reporting similar issues recently.

Has anyone else experienced this latency increase? And if anyone from OpenAI is reading this — is there any insight you can share about what might be causing this?

Thanks in advance!

_j · April 30, 2025, 9:10pm

For non-cacheable, 1800 in, 256 out gpt-4.1-mini

April 16, shortly after release:

April 30 (rescaled, note axis values):

The bulk of the histogram has moved from below 2 seconds to above 2 seconds.

— Performance Statistics Report (100 trials per model) —

For 100 trials of gpt-4.1-mini @ 2025-04-30:

Metric	Average	Minimum	Maximum
latency_s	0.634	0.312	5.617
stream_rate	60.2	16.3	98.8
total_rate	52.7	15.7	74.2
total_time_s	5.211	3.451	16.297
response_tokens	256.0	256	256

Topic		Replies	Views
Latency inconsistencies with gpt-4.1-mini responses API gpt-4 , api	0	261	August 22, 2025
GPT-4.1 models are very slow due to API response. API	5	1084	September 8, 2025
Gpt-4o-mini is really slow API gpt-4o-mini	6	3894	March 18, 2025
Inconsistent / Slow API Times This Past Week Feedback	0	139	November 12, 2025
Inconsistent GPT-4.1 Mini API Performance Over Time API api	0	341	July 23, 2025

Increased latency on GPT-4.1 Mini over the past few days

For 100 trials of gpt-4.1-mini @ 2025-04-30:

Related topics