Completion API performances (response time)

larderamarco · November 8, 2025, 6:36pm

Hi everyone,

I’m building an application using the completion api. I feel like the response times are too high and also not very consistent.

For example, using “gpt-4o”, with a 20k token request (14k prompt, 6k completion) i have response times ranging from 45 seconds to 100 seconds.

Are response times like this normal? Is there anything I can do to improve?

Thank you!

_j · November 8, 2025, 8:08pm

You report a 60-133 token-per-second generation rate on gpt-4o, inclusive of latency and input processing. That alias is gpt-4o-2024-08-06 (vs two other gpt-4o models to be tried) - on CHAT Completions.

That is quite normal and actually pretty impressive.

If you want to double your costs, you can use the “service_tier”:“priority” API parameter, to stay on the high end, a bulk service level >90TPS.

Topic		Replies	Views
Slow Responses From POST https://api.openai.com/v1/chat/completions API api	1	216	October 17, 2025
Really slow response time with text completions API today? API	2	534	May 27, 2025
OpenAI API takes too long to response API api	2	964	March 25, 2024
Gpt-4-0125-preview INCREDIBLY slower than 3.5 turbo API	13	9782	December 29, 2025
API completions endpoint performance API	7	2194	December 25, 2023

Completion API performances (response time)

Related topics