High latency for fine-tuned gpt-4o-mini

Hi there!

Recently got in to fine-tuning a gpt-4o-mini-2024-07-18 model. I only have about 30 elements in my training set, and in some initial testing, I was pleased with the results of the output however the latency is slower than expected.

It can range anywhere from 2-10 seconds - one instance even went for 20 seconds. Has anyone had any luck in cutting down latency? I’m pretty new, but I wonder if there is any caching involved and if so, if anyone has had luck warming up the cache (say, before expected heavy use periods). I’d really like to use this in prod, but it simply must be faster (said every swe ever lol). Perhaps other models are faster? I tried prompt engineering but it wasn’t up to the task.

Thank you all for your input, O’ noble OpenAI community.

Sure is lonely in here :frowning:

This is actually super good, considering!

The more context you send, the longer it takes.

1 Like