Hi,
My team measured Davinci and Curie response time during the day to understand the occasional long response times we get in our application (tested with 1 token prompt):
We need an answer time under 2 seconds, so here are my questions:
- Are these peaks due to queuing?
- Is it possible to buy like “exclusive endpoints” or something to get a better response time and stability?
Thanks!
Hi and welcome to the Developer Forum!
You can indeed get a dedicated instance, they start to make commercial sense if you are using ~450M tokens per day, you can reach out to get more information here