What's the max latency for OpenAI models accessed via Azure with provisioned throughput units?

I’m currently using pay-as-you-go on Azure OpenAI and I’m curious to see how switching to provisioned throughput units would improve the latency.

I read:

Predictable performance: stable max latency and throughput for uniform workloads.

What’s the max latency for OpenAI models accessed via Azure with provisioned throughput units?

1 Like

I imagine you’d need to talk to your microsoft sales/account team.

If it’s anything like the other stuff, it depends on what hardware you will be provisioned on. They’re currently extremely short on and very stingy with their A100s.

1 Like

@Diet do you have insights on the minimum throughput for which they would even offer this option? currently also scratching my head over how to improve latency for models deployed in Azure OpenAI…

1 Like

I suspect it’s completely unrelated to throughput or need - it looks like it’s more of a strategic bargaining chip for their high value enterprise customers than anything else for now. :confused:

What regions are you deployed in? If you deploy all over the world, you can surf the wee-hours.

Ah I see :unamused: currently in Canada East which has proven to be painful. Potentially I can branch out to other countries (with exception of US), so might give this a try.

1 Like