How to reduce OpenAI response time?

We have an application that queries OpenAI with multiple prompts. However, many of the requests we send to OpenAI take many seconds to return the result. This ends up bringing a bad experience for the end user. I believe this is a problem that more people face. Do you have any solution for the problem? Or else, good practices on how to improve response time. Thank you very much.

Here is a example:


Good question, I have been wondering this myself. I’m just responding so that I will get a notification if someone answers :wink:

Can you share any more specifics on what you are sending via the API, which end points, etc?

I’m using completion creation endpoint


“Creates a completion for the provided prompt and parameters”

We always use Davinci as main engine. We are using Davinci 3 in many prompts.

I know latency is something we are working on so I pinged some folks internally to see if we have any suggested best practices. Hang tight!


Thanks @logankilpatrick !


I suppose you already have a QOS setup for paying users ?

Maybe MS packing too much stuff on a single server and/or not scaling the spikes in usage correctly…

I’m facing a similar problem, all the APIs at least take> 10 seconds of latency. I have tried using different values for temperature but no luck.

I use below configs

    temperature: 0.66,
    max_tokens: 2147,
    top_p: 1,
    frequency_penalty: 0,
    presence_penalty: 0,

The max_tokens parameters hurts latency significantly. Try to reduce it as much as you can.
Also: there is the possibility of streaming results instead of waiting until the response is fully-computed. Hope that helps!


Thanks, I reduced max_tokens to 1000 but not much impact on latency, from 10 seconds to 9 seconds now. I’ll try streaming.

1 Like

We just shipped a new section in the docs inspired by your question here @alex.batista:


Very nice! The content is so good. We’ll check every recommendation on this document. This definitively will help a lot of people.Good job and thank you so much @logankilpatrick and whole team.


I wonder why we don’t try a new underlying model? I have some new thinks, the problem is that i don’t know how to contact you.