Gpt-4o-mini is really slow

gwendal · January 6, 2025, 8:01pm

I’m working on a new application that uses GPT-4o-mini to generate data from rather big text inputs, but I’ve been experiencing significant delays in response times.

Regardless of whether I’m requesting raw text or structured JSON outputs, everything seems sluggish.

Input token count ranges between 20k-30k, with average completion tokens around 700.

I get an output average of 25 to 30 tokens per second, which is frustratingly slow for real-world application usage, making it impossible for me to use this model because it takes almost 30 seconds to get a full response.

I previously used this model in another application and don’t recall it being this slow. At the very least, it didn’t seem this sluggish.

Am I missing something, or has there been a recent change affecting performance?

curt.kennedy · January 6, 2025, 8:22pm

I see 4o (not mini) yield 30-50 output tokens per second.

Since your context is pretty large, the 30 OTPS doesn’t surprise me.

You can try streaming out the response if the user is “seeing” it in realtime.

Maybe others can chime in on 4o-mini with 30k input tokens.

gwendal · January 6, 2025, 8:26pm

Unfortunately streaming the responses to the users is not a solution to my problem.

curt.kennedy · January 6, 2025, 8:27pm

Another hack is to fine-tune the model. Maybe even do a light fine-tune.

This generally yields lower latencies, at least historically, but it costs more to run.

That might be something to try.

PaulBellow · January 6, 2025, 8:38pm

I’ve always had slower responses with lots of input tokens.

Any way to trim it down?

Same amount of tokens for input?

Topic		Replies	Views
GPT-4o-mini randomly much slower than GPT-3.5-turbo Bugs gpt-4o-mini	8	491	November 20, 2024
Gpt-4-0125-preview INCREDIBLY slower than 3.5 turbo API	12	9380	July 22, 2024
GPT 4 API is Very Slow Still API gpt-4 , chatgpt , api	15	6602	December 16, 2023
High latency for fine-tuned gpt-4o-mini API	4	535	November 26, 2024
Unstable speed of gpt-3.5-turbo-16k API api , gpt-35-turbo-16k , performance	6	1053	January 9, 2024

Gpt-4o-mini is really slow

Related topics