We have an application that queries OpenAI with multiple prompts. However, many of the requests we send to OpenAI take many seconds to return the result. This ends up bringing a bad experience for the end user. I believe this is a problem that more people face. Do you have any solution for the problem? Or else, good practices on how to improve response time. Thank you very much.

Here is a example:

2 Likes

Good question, I have been wondering this myself. I’m just responding so that I will get a notification if someone answers :wink:

Can you share any more specifics on what you are sending via the API, which end points, etc?

I’m using completion creation endpoint

POST https://api.openai.com/v1/completions

“Creates a completion for the provided prompt and parameters”

We always use Davinci as main engine. We are using Davinci 3 in many prompts.

I know latency is something we are working on so I pinged some folks internally to see if we have any suggested best practices. Hang tight!

8 Likes

Thanks @logankilpatrick !

2 Likes

I suppose you already have a QOS setup for paying users ?

Maybe MS packing too much stuff on a single server and/or not scaling the spikes in usage correctly…

I’m facing a similar problem, all the APIs at least take> 10 seconds of latency. I have tried using different values for temperature but no luck.

I use below configs

    temperature: 0.66,
    max_tokens: 2147,
    top_p: 1,
    frequency_penalty: 0,
    presence_penalty: 0,

The max_tokens parameters hurts latency significantly. Try to reduce it as much as you can.
Also: there is the possibility of streaming results instead of waiting until the response is fully-computed. Hope that helps!

2 Likes

Thanks, I reduced max_tokens to 1000 but not much impact on latency, from 10 seconds to 9 seconds now. I’ll try streaming.

1 Like

We just shipped a new section in the docs inspired by your question here @alex.batista:

7 Likes

Very nice! The content is so good. We’ll check every recommendation on this document. This definitively will help a lot of people.Good job and thank you so much @logankilpatrick and whole team.

4 Likes

I wonder why we don’t try a new underlying model? I have some new thinks, the problem is that i don’t know how to contact you.