Are folks noticing improved FineTuned model latencies with Tier 5 vs Tier 4?

realsanketp · January 16, 2025, 7:44pm

My finetune works as expected, but the latency is a little too high. I am wondering if I should spend effort trying to tweak the finetune, or just get to Tier 5 and expect improved latency.

Macha · January 16, 2025, 9:53pm

Hey there and welcome to the community!

Latency is a tough problem to solve API-wise. That’s not typically something in the control of the developer. Or, rather, if the API responds too slowly, there’s not much that could be done on the developer’s end except maybe tweaking your internet/networking setup. Unless you’re working/deploying this in specific cloud environments, where latency is beholden to the setup the cloud instance it’s being run on (and the configurations of that instance).

I don’t know what you’re doing or what your code looks like, but sometimes increasing the efficiency of the code can help improve latency. However, that’s very dependent on what is going on with the code. I’d just do a basic check, make sure there’s no bottlenecks accidentally slowing things down. You never know some days.

The tiers often involve unlocking restrictions on rate limits, which is a bit different. These tiers signal how many tokens can be requested per minute. In theory, all tiers should be given their tokens at the same rate of time. The limit is in how many tokens you’re allowed to “receive” during that 1-minute duration. Therefore, increasing your tier wouldn’t necessarily increase your latency. Also, tier 4 → tier 5 is a hefty jump, just a forewarning.

I saw a page somewhere discussing OAI experimenting with some kind of scalable hoo-ha that might improve latency, but it’s not yet available for fine-tuned models so it doesn’t apply here.

Topic		Replies	Views
What is the expected inference latency of fine-tuned gpt-4 model? API gpt-4 , fine-tuning	3	1812	August 2, 2024
High latency with a fine tuned 4o-mini model API fine-tuning , fine-tuning-problems	0	148	March 3, 2025
Can we get some clarification on tiers and latency? API assistants-api	1	588	October 31, 2024
Fine-tuned gpt-3.5-turbo latency Feedback fine-tuning-problems	15	3790	November 15, 2024
High latency for fine-tuned gpt-4o-mini API	4	942	November 26, 2024

Are folks noticing improved FineTuned model latencies with Tier 5 vs Tier 4?

Related topics