We have the same API running on AWS and Azure and noticed a significant difference in the quality of the generated tokens.
Ironically, the API running on AWS is giving higher quality results with the same queries to openai.
The Azure API is getting back responses with some type of soft truncation where the number of output tokens is reduced sometimes by up to 30% for identical requests.
We were mid migration and doing extensive testing and noticed this issue.
The endpoints called through Azure also appear to be more stiff. They don’t respond as well to system or user prompts and consistently make more mistakes.
This looks like some type of backend Azure to Azure optimization gone wrong.
I reviewed the issue with some large enterprise customers and they had some similar issues.
Hope this gets resolved soon.