High latency with a fine tuned 4o-mini model

jsnyder · March 3, 2025, 5:06pm

Hey folks. I am working on a fine tune problem where we are generating json files that follow a specific internal structure. Previously, we were using a RAG based approach and were able to get a result in single digit seconds. Once a model was fine tuned, this latency is up to be over one minute. Have others experienced this?

For what it’s worth, this was a test run. The output was poor quality but number of output tokens was consistent with the targets.

Topic		Replies	Views
High latency for fine-tuned gpt-4o-mini API	4	1119	November 26, 2024
Fine-tuned gpt-3.5-turbo latency Feedback fine-tuning-problems	15	4010	November 15, 2024
Degraded performance when calling finetuned gpt-4o-mini API fine-tuning , gpt-4o-mini	0	96	July 21, 2025
Are folks noticing improved FineTuned model latencies with Tier 5 vs Tier 4? API fine-tuning	1	141	January 16, 2025
Fine-tuned model with same seed and data is 7x slower now vs 6 months ago API fine-tuning-problems	2	484	January 9, 2025

High latency with a fine tuned 4o-mini model

Related topics