Best GPT response time for real-time applications


My team measured Davinci and Curie response time during the day to understand the occasional long response times we get in our application (tested with 1 token prompt):

We need an answer time under 2 seconds, so here are my questions:

  • Are these peaks due to queuing?
  • Is it possible to buy like “exclusive endpoints” or something to get a better response time and stability?


You can indeed get a dedicated instance, they start to make commercial sense if you are using ~450M tokens per day, you can reach out to get more information here