We have an application that queries OpenAI with multiple prompts. However, many of the requests we send to OpenAI take many seconds to return the result. This ends up bringing a bad experience for the end user. I believe this is a problem that more people face. Do you have any solution for the problem? Or else, good practices on how to improve response time. Thank you very much.
Here is a example:
2 Likes
Good question, I have been wondering this myself. I’m just responding so that I will get a notification if someone answers 
Can you share any more specifics on what you are sending via the API, which end points, etc?
I’m using completion creation endpoint
POST https://api.openai.com/v1/completions
“Creates a completion for the provided prompt and parameters”
We always use Davinci as main engine. We are using Davinci 3 in many prompts.
I know latency is something we are working on so I pinged some folks internally to see if we have any suggested best practices. Hang tight!
8 Likes
I suppose you already have a QOS setup for paying users ?
Maybe MS packing too much stuff on a single server and/or not scaling the spikes in usage correctly…
I’m facing a similar problem, all the APIs at least take> 10 seconds of latency. I have tried using different values for temperature but no luck.
I use below configs
temperature: 0.66,
max_tokens: 2147,
top_p: 1,
frequency_penalty: 0,
presence_penalty: 0,
AgusPG
9
The max_tokens parameters hurts latency significantly. Try to reduce it as much as you can.
Also: there is the possibility of streaming results instead of waiting until the response is fully-computed. Hope that helps!
2 Likes
Thanks, I reduced max_tokens to 1000 but not much impact on latency, from 10 seconds to 9 seconds now. I’ll try streaming.
1 Like
We just shipped a new section in the docs inspired by your question here @alex.batista:
7 Likes
Very nice! The content is so good. We’ll check every recommendation on this document. This definitively will help a lot of people.Good job and thank you so much @logankilpatrick and whole team.
4 Likes
I wonder why we don’t try a new underlying model? I have some new thinks, the problem is that i don’t know how to contact you.