My application calls for as close to real time concurrency as possible. I searched to see if there was an existing thread on this, but couldn’t find one.
My understanding of the speed of the API returning a completion is based on the following factors:
- Engine selected
- Size of prompt
- Response length
I’m wondering whether there are any other variables to consider (where hopefully I might make improvements).
Related to this, I was wondering if anyone has insight as to whether using the content filter would/should make the API take longer to respond (I’m guessing that it would).
If anyone has any thoughts to offer, it would be much appreciated. Thanks!