Increasing speed / reducing latency

Hi there,

My application calls for as close to real time concurrency as possible. I searched to see if there was an existing thread on this, but couldn’t find one.

My understanding of the speed of the API returning a completion is based on the following factors:

  1. Engine selected
  2. Size of prompt
  3. Response length

I’m wondering whether there are any other variables to consider (where hopefully I might make improvements).

Related to this, I was wondering if anyone has insight as to whether using the content filter would/should make the API take longer to respond (I’m guessing that it would).

If anyone has any thoughts to offer, it would be much appreciated. Thanks!


You can pass many prompts into one API call (use an array of strings in the parameter prompt) to save a lot of overhead/back-and-forth.

Regarding the content filter: this is separate to the completion end point. You would need to add one more API call to your pipeline. And then, yes, it would take longer, overall.

If you let us know what input you will send (i.e. number of prompts, prompt-length and engine type, per minute) then we/ can give you some opinions if/how fast that will work.

1 Like

Wait what?

Many prompts to one API call?