How to reduce OpenAI response time?

alex.batista · January 17, 2023, 2:25pm

We have an application that queries OpenAI with multiple prompts. However, many of the requests we send to OpenAI take many seconds to return the result. This ends up bringing a bad experience for the end user. I believe this is a problem that more people face. Do you have any solution for the problem? Or else, good practices on how to improve response time. Thank you very much.

Here is a example:

amra.dorjbayar · January 17, 2023, 2:37pm

Good question, I have been wondering this myself. I’m just responding so that I will get a notification if someone answers

logankilpatrick · January 17, 2023, 2:47pm

Can you share any more specifics on what you are sending via the API, which end points, etc?

alex.batista · January 17, 2023, 2:50pm

I’m using completion creation endpoint

POST https://api.openai.com/v1/completions

“Creates a completion for the provided prompt and parameters”

We always use Davinci as main engine. We are using Davinci 3 in many prompts.

logankilpatrick · January 17, 2023, 2:53pm

I know latency is something we are working on so I pinged some folks internally to see if we have any suggested best practices. Hang tight!

alex.batista · January 17, 2023, 2:55pm

Thanks @logankilpatrick !

i-technology · January 17, 2023, 6:47pm

I suppose you already have a QOS setup for paying users ?

Maybe MS packing too much stuff on a single server and/or not scaling the spikes in usage correctly…

sagarjani · January 18, 2023, 11:09am

I’m facing a similar problem, all the APIs at least take> 10 seconds of latency. I have tried using different values for temperature but no luck.

I use below configs

    temperature: 0.66,
    max_tokens: 2147,
    top_p: 1,
    frequency_penalty: 0,
    presence_penalty: 0,

AgusPG · January 18, 2023, 11:29am

The max_tokens parameters hurts latency significantly. Try to reduce it as much as you can.
Also: there is the possibility of streaming results instead of waiting until the response is fully-computed. Hope that helps!

sagarjani · January 18, 2023, 11:47am

Thanks, I reduced max_tokens to 1000 but not much impact on latency, from 10 seconds to 9 seconds now. I’ll try streaming.

logankilpatrick · January 20, 2023, 3:24pm

We just shipped a new section in the docs inspired by your question here @alex.batista:

alex.batista · January 20, 2023, 4:26pm

Very nice! The content is so good. We’ll check every recommendation on this document. This definitively will help a lot of people.Good job and thank you so much @logankilpatrick and whole team.

wuweishijieee · May 23, 2023, 2:17pm

I wonder why we don’t try a new underlying model? I have some new thinks, the problem is that i don’t know how to contact you.

Topic		Replies	Views
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	22659	November 9, 2023
API completions endpoint performance API	7	2053	December 25, 2023
Davinci-text-003 Response Times API	14	3477	December 25, 2023
Completion Speeds - How can we optimise speeds! URGENTLY! API	8	2155	December 25, 2023
Performance issue with gpt-4-turbo-preview API API gpt-4 , api , performance	1	1253	February 17, 2024

How to reduce OpenAI response time?

Related topics