GPT-4o mini slow inference

nidhal.jegham · April 8, 2025, 7:20pm

Hi everyone,

Does anyone have any idea why 4o mini is way slower than 4o although it is much smaller?

Do you think it is running on slower GPUs or they nerfed its performance for monetary reasons?

vb · April 8, 2025, 7:32pm

Yes, I see that this also being reported on the unofficial status page.

This is likely a temporary issue because being faster is one of the key features of offering smaller models.

Thanks for flagging.

aprendendo.next · April 8, 2025, 7:41pm

Interesting link.

In addition to usage fluctuations, I think it might be related to the fact that 4o has received recent updates to reduce gpu usage, while 4o-mini has been lagging behind for a while.

nidhal.jegham · April 8, 2025, 8:27pm

Totally agree with you. That is why you can find the May 2024, Nov 2024, and March 2025 4o models in API, and there are huge fluctuations in terms of Latency and Speed between them.

OpenAI_Support · April 9, 2025, 9:21pm

Great question—latency can be caused a few factors, aside from model size. For example,

Engine load balancing: In some cases, 4o mini requests may wait longer in queue depending on how the system routes traffic.
Caching behavior: Enabling caching can sometimes increase latency because it pins the request to a specific engine that may not be the fastest available at that moment.

So while 4o mini is indeed designed for high throughput, things like queue times and caching strategy can still impact latency.

That said, I’ve flagged this internally. Thank you for surfacing 🙏

nidhal.jegham · April 9, 2025, 9:44pm

Thank you so much for the insights.

I also noticed 4o-mini has way slower TPS compared to 4o. I have seen some threads online about it getting x4 times slower overnight 6 months ago.

However, the 4o-mini API from Microsoft Azure did not drop in performance. Through some heuristic methods, I noticed that this downgrade in performance can be explained by switching from H100 to A100 GPUs.

OpenAI_Support · April 9, 2025, 10:07pm

The H100 vs A100 theory makes a lot of sense and likely matches some of what we’ve been seeing too.

Thanks for sharing your testing!

Topic		Replies	Views
GPT-4o-mini randomly much slower than GPT-3.5-turbo Bugs gpt-4o-mini	8	977	November 20, 2024
Inference speed of different models API	1	111	July 4, 2025
Gpt-4o-mini is really slow API gpt-4o-mini	6	2669	March 18, 2025
GPT-4o-2024–08–06 slower then previous version API gpt-4o	9	1084	January 7, 2025
Gpt-4-0125-preview INCREDIBLY slower than 3.5 turbo API	12	9594	July 22, 2024

GPT-4o mini slow inference

Related topics