Are folks noticing improved FineTuned model latencies with Tier 5 vs Tier 4?

My finetune works as expected, but the latency is a little too high. I am wondering if I should spend effort trying to tweak the finetune, or just get to Tier 5 and expect improved latency.

Hey there and welcome to the community!

Latency is a tough problem to solve API-wise. That’s not typically something in the control of the developer. Or, rather, if the API responds too slowly, there’s not much that could be done on the developer’s end except maybe tweaking your internet/networking setup. Unless you’re working/deploying this in specific cloud environments, where latency is beholden to the setup the cloud instance it’s being run on (and the configurations of that instance).

I don’t know what you’re doing or what your code looks like, but sometimes increasing the efficiency of the code can help improve latency. However, that’s very dependent on what is going on with the code. I’d just do a basic check, make sure there’s no bottlenecks accidentally slowing things down. You never know some days.

The tiers often involve unlocking restrictions on rate limits, which is a bit different. These tiers signal how many tokens can be requested per minute. In theory, all tiers should be given their tokens at the same rate of time. The limit is in how many tokens you’re allowed to “receive” during that 1-minute duration. Therefore, increasing your tier wouldn’t necessarily increase your latency. Also, tier 4 → tier 5 is a hefty jump, just a forewarning.

I saw a page somewhere discussing OAI experimenting with some kind of scalable hoo-ha that might improve latency, but it’s not yet available for fine-tuned models so it doesn’t apply here.

1 Like