Hi,
My team measured Davinci and Curie response time during the day to understand the occasional long response times we get in our application (tested with 1 token prompt):
We need an answer time under 2 seconds, so here are my questions:
- Are these peaks due to queuing?
- Is it possible to buy like “exclusive endpoints” or something to get a better response time and stability?
Thanks!