How to reduce response latency in Azure OpenAI GPT-3.5/GPT-4 API or find a better-performing model?

shaan.muhammad · April 22, 2025, 10:06am

I’m using the Azure OpenAI API with the following models:

GPT-3.5 Turbo → ~900ms average response time
GPT-4 → ~1.3 seconds average response time

My goal is to get faster responses (ideally sub-500ms) for real-time use cases like code fixing.

I’m already using optimized parameters:

{
“messages”: [
{
“role”: “user”,
“content”: “Fix code without explanation.ts\nexport function extractCodeOnly(input: string): string[] {\n const regex = /(?:\w+)?\s*([\s\S]?)\s/g;\n const result: string[] = [];\n\n const match: RegExpExecArray | null;\n while ((match2 = regex.exec(input)) !== null) {\n result.push(match[1].trim());\n }\n\n return xyz;\n}\n”
}
],
“temperature”: 0.2,
“max_tokens”: 100,
“top_p”: 1,
“presence_penalty”: 0,
“frequency_penalty”: 0
}

Despite tuning these settings, the response time is still higher than I’d like. I’ve experimented with stream and logit_bias but haven’t seen significant improvements.

Questions:

Are there additional settings or strategies to improve response latency?
Is there a better-performing model available (via Azure or elsewhere) that offers faster response times for small code-related tasks?
Would switching to another model like Claude, LLaMA, or Mistral through a different provider offer a performance gain?

Any insights on tuning, model choices, or deployment tips would be greatly appreciated!

aprendendo.next · April 22, 2025, 11:37am

These models are quite outdated, is there a particular reason you use them?

The corresponding newer models for non reasoning tasks would be gpt-4.1, gpt-4.1-mini and gpt-4.1-nano. Or gpt-4o and gpt-4o-mini for the previous generation.

New models are usually faster.

Topic		Replies	Views
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	21788	November 9, 2023
Seeking Advises on Optimizing openAI API Calls Feedback api	2	554	November 16, 2023
Slow response time with GPT-4 API	2	5088	March 19, 2023
Performance Degradation and Increased Latency in GPT Models API gpt-4 , api	1	637	May 17, 2024
Performance issue with gpt-4-turbo-preview API API gpt-4 , api , performance	1	1237	February 17, 2024

How to reduce response latency in Azure OpenAI GPT-3.5/GPT-4 API or find a better-performing model?

Questions:

Related topics