Hi everyone,
I work with an application where latency is a critical factor. In our setup, we use a large fixed context and make use of cache tokens to reduce latency in our requests — each of which includes prompts of around 110k tokens.
When GPT-4.1 Mini was released, I ran several tests and got very promising results — average latency was around 4 seconds, with the 90th percentile reaching 7 seconds, which was acceptable for our use case.
However, over the past few days, I’ve noticed a significant degradation in performance. In my recent tests, the average latency has increased to 7 seconds, and the 90th percentile is now hitting 12 seconds.
Just to clarify, we are strictly respecting our organization’s rate limits, so I’m confident that this is not the issue.
I haven’t seen any official communication from OpenAI regarding API instability specifically for GPT-4.1 Mini, and I also couldn’t find other users reporting similar issues recently.
Has anyone else experienced this latency increase? And if anyone from OpenAI is reading this — is there any insight you can share about what might be causing this?
Thanks in advance!