Client.chat.completions.create latency

TARUN_PANWAR · July 2, 2025, 7:25am

When I call client.chat.completions.create it gives very high latency of more than 9 seconds.
Using azure open ai deployment with gpt4 o mini and azure search added as datasource

When retired with next few seconds latency is less than 3 seconds..

Application would be calling this code intermittently so everytime latency is coming to be more than 8 seconds which is unacceptable.Is there a way to reduce it?

I have already reduced tokens to 300,temp set to .2 and prompt is also very less.It looks like api is taking more time to lookup index first time.

Please suggest what can be done

Topic		Replies	Views
openai.Completion.create API performance API	0	446	May 2, 2023
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	23983	November 9, 2023
API completions endpoint performance API	7	2086	December 25, 2023
Completion vs. chat performance API api-speed	3	3314	December 24, 2023
Chatgpt-3.5 turbo model takes long time to respond. Is there any way to speed this up? API gpt-35-turbo , api-speed	7	6604	December 19, 2023

Client.chat.completions.create latency

Related topics