Increasing speed / reducing latency

matt · September 21, 2021, 8:08pm

Hi there,

My application calls for as close to real time concurrency as possible. I searched to see if there was an existing thread on this, but couldn’t find one.

My understanding of the speed of the API returning a completion is based on the following factors:

Engine selected
Size of prompt
Response length

I’m wondering whether there are any other variables to consider (where hopefully I might make improvements).

Related to this, I was wondering if anyone has insight as to whether using the content filter would/should make the API take longer to respond (I’m guessing that it would).

If anyone has any thoughts to offer, it would be much appreciated. Thanks!

Matt

trian.xylouris · September 22, 2021, 8:27am

You can pass many prompts into one API call (use an array of strings in the parameter prompt) to save a lot of overhead/back-and-forth.

Regarding the content filter: this is separate to the completion end point. You would need to add one more API call to your pipeline. And then, yes, it would take longer, overall.

If you let us know what input you will send (i.e. number of prompts, prompt-length and engine type, per minute) then we/open.ai can give you some opinions if/how fast that will work.

ian · September 22, 2021, 3:43pm

Wait what?

Many prompts to one API call?

Topic		Replies	Views
API completions endpoint performance API	7	2103	December 25, 2023
Completion Speeds - How can we optimise speeds! URGENTLY! API	8	2248	December 25, 2023
How to reduce OpenAI response time? API	13	18013	December 13, 2023
Multiple prompt responses everywhere API	6	3718	December 25, 2023
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	3	24226	November 9, 2023

Increasing speed / reducing latency

Related topics