What's the max latency for OpenAI models accessed via Azure with provisioned throughput units?

Franck.Dernoncourt · March 2, 2024, 8:57pm

I’m currently using pay-as-you-go on Azure OpenAI and I’m curious to see how switching to provisioned throughput units would improve the latency.

I read:

Predictable performance: stable max latency and throughput for uniform workloads.

What’s the max latency for OpenAI models accessed via Azure with provisioned throughput units?

Diet · March 2, 2024, 9:10pm

I imagine you’d need to talk to your microsoft sales/account team.

If it’s anything like the other stuff, it depends on what hardware you will be provisioned on. They’re currently extremely short on and very stingy with their A100s.

jr.2509 · March 2, 2024, 9:26pm

@Diet do you have insights on the minimum throughput for which they would even offer this option? currently also scratching my head over how to improve latency for models deployed in Azure OpenAI…

Diet · March 2, 2024, 9:35pm

I suspect it’s completely unrelated to throughput or need - it looks like it’s more of a strategic bargaining chip for their high value enterprise customers than anything else for now.

What regions are you deployed in? If you deploy all over the world, you can surf the wee-hours.

jr.2509 · March 2, 2024, 9:56pm

Ah I see currently in Canada East which has proven to be painful. Potentially I can branch out to other countries (with exception of US), so might give this a try.

Topic		Replies	Views
API latency when backend is hosted on Azure? API api	7	5131	November 3, 2023
How is response time on Azure-hosted OpenAI models vs OpenAI’s API? API api , performance , azure-openai	0	151	November 13, 2025
How to reduce response latency in Azure OpenAI GPT-3.5/GPT-4 API or find a better-performing model? Community gpt-4 , gpt-35-turbo , openai	1	868	April 22, 2025
Models hosted in the EU any time soon? API whisper , gpt-4o , gpt-4o-mini	3	4586	August 26, 2024
Is OpenAI planning to host their API service in different regions? API	4	1800	August 20, 2024

What's the max latency for OpenAI models accessed via Azure with provisioned throughput units?

Related topics