API latency when backend is hosted on Azure?

I am trying to create a product using the OpenAI APIs. Since I will be using Langchain, likely latency will be at least a relatively important part of the overall response time. Now, I was wondering if I would host the backend on Azure as a docker container, would that minimize the API latency versus an alternative approach where the backend server would be hosted on another arbitrary provider… Even though I would not be able to use the OpenAI APIs that are directly offered by Azure (OpenAI APIs are also available from Microsoft, not just throught OpenAI), I would still believe that the inference for the OpenAI “versions” are also served by Azure indirectly. Hence my original assumption that this would yield the lowest latency.

Does someone have experience or concrete measurements that can back this up? Or am I overthinking this? :slight_smile: Again, I just thought that a backend that will leverage Langchain heavily would benefit from a low latency connection to the language model servers…

Nothing concrete to back it up, but if you are calling the LLMs you are looking at seconds of latency already. So any benefits you get from being closer to the APIs will be negligible.

Yeah the actual API call itself (the time it takes GPT to think up the answer) is much slower than any network time from your server to the API. It’s barely worth thinking about.

Hi, what about Azure OpenAI response time x original OpenAI response time… is there any difference?

Anecdotally people are saying Azure is faster

2 Likes

thanks for the reply, very kind of you.

recently moved a client project from
openai api endpoint to azure open ai api endpoint (private instance)
saw a decrease in latency of 80%, so yes, for production you should consider a private instance of Azure Open AI Service