Completion API performances (response time)

Hi everyone,

I’m building an application using the completion api. I feel like the response times are too high and also not very consistent.

For example, using “gpt-4o”, with a 20k token request (14k prompt, 6k completion) i have response times ranging from 45 seconds to 100 seconds.

Are response times like this normal? Is there anything I can do to improve?

Thank you!

You report a 60-133 token-per-second generation rate on gpt-4o, inclusive of latency and input processing. That alias is gpt-4o-2024-08-06 (vs two other gpt-4o models to be tried) - on CHAT Completions.

That is quite normal and actually pretty impressive.

If you want to double your costs, you can use the “service_tier”:“priority” API parameter, to stay on the high end, a bulk service level >90TPS.

1 Like