I am trying to create a product using the OpenAI APIs. Since I will be using Langchain, likely latency will be at least a relatively important part of the overall response time. Now, I was wondering if I would host the backend on Azure as a docker container, would that minimize the API latency versus an alternative approach where the backend server would be hosted on another arbitrary provider… Even though I would not be able to use the OpenAI APIs that are directly offered by Azure (OpenAI APIs are also available from Microsoft, not just throught OpenAI), I would still believe that the inference for the OpenAI “versions” are also served by Azure indirectly. Hence my original assumption that this would yield the lowest latency.
Does someone have experience or concrete measurements that can back this up? Or am I overthinking this? Again, I just thought that a backend that will leverage Langchain heavily would benefit from a low latency connection to the language model servers…
recently moved a client project from
openai api endpoint to azure open ai api endpoint (private instance)
saw a decrease in latency of 80%, so yes, for production you should consider a private instance of Azure Open AI Service