When I call client.chat.completions.create it gives very high latency of more than 9 seconds.
Using azure open ai deployment with gpt4 o mini and azure search added as datasource
When retired with next few seconds latency is less than 3 seconds..
Application would be calling this code intermittently so everytime latency is coming to be more than 8 seconds which is unacceptable.Is there a way to reduce it?
I have already reduced tokens to 300,temp set to .2 and prompt is also very less.It looks like api is taking more time to lookup index first time.
Please suggest what can be done