High latency for fine-tuned gpt-4o-mini

Dunc · August 19, 2024, 1:32pm

Hi there!

Recently got in to fine-tuning a gpt-4o-mini-2024-07-18 model. I only have about 30 elements in my training set, and in some initial testing, I was pleased with the results of the output however the latency is slower than expected.

It can range anywhere from 2-10 seconds - one instance even went for 20 seconds. Has anyone had any luck in cutting down latency? I’m pretty new, but I wonder if there is any caching involved and if so, if anyone has had luck warming up the cache (say, before expected heavy use periods). I’d really like to use this in prod, but it simply must be faster (said every swe ever lol). Perhaps other models are faster? I tried prompt engineering but it wasn’t up to the task.

Thank you all for your input, O’ noble OpenAI community.

Dunc · August 21, 2024, 7:39pm

Sure is lonely in here

PaulBellow · August 21, 2024, 7:58pm

This is actually super good, considering!

The more context you send, the longer it takes.

hmndev-jack · November 26, 2024, 5:46pm

@Dunc Hey! Seeing the same (or worse even, 39s?!)… did you ever find a magic solution for this?

_j · November 26, 2024, 7:14pm

Fine-tunings take a while to warm up after inactivity, for whatever reason on the backend.

You can make some async 1 token calls to the model when you anticipate use of it, for cases where you have some warning that it may be called upon.

Topic		Replies	Views
Fine-tuned gpt-3.5-turbo latency Feedback fine-tuning-problems	15	3859	November 15, 2024
What is considered as normal latency? API	3	3058	December 15, 2023
High latency with a fine tuned 4o-mini model API fine-tuning , fine-tuning-problems	0	200	March 3, 2025
Gpt-4o-mini is really slow API gpt-4o-mini	6	3237	March 18, 2025
What is the expected inference latency of fine-tuned gpt-4 model? API gpt-4 , fine-tuning	3	1887	August 2, 2024

High latency for fine-tuned gpt-4o-mini

Related topics