Is there any way to get more priority in the completion queue, in my case I don’t need a super fast completion in order to achieve the best user experience for my users, but i need to know the first token as soon as possible.
For example, I need to know the intent of the next action that AI will do in order to introduce that action with a fast reply, I don’t need all the Function JSON Output.
I’ve tried many Azure clouds and OpenAI
UK StartTime: 2753.293562ms
UK EndTime: 6835.470221ms
US StartTime: 2239.90463ms
US EndTime: 6341.402898ms
France StartTime: 1943.414065ms
France EndTime: 5227.289918ms
OpenAI StartTime: 1009.065938ms
OpenAI EndTime: 6070.109997ms
and right now OpenAI seems to be the fastest to start answering but it’s still quite slow (it depends on day and hours) but i would like to know if there is any possibility to have like a priority queue where the first response is granted in something like 300-500ms max
I’ve only done a limited amount of testing with my streaming bot setup, it’s always been sub 1000ms. Time of day will certainly have an effect, so will system load. If you wish to have a dedicated instance to ensure you are the only person using the endpoint then you can apply for one via the sales contact page here
I’ve tried multiple times in multiple days and during the day it starts from very fast (400ms sometimes) and goes to very slow (4s and more of waiting without any token coming out).
I’ll try to contact via the sales page, thank you
Still no responses from sales team… just for knowledge, do you know if it’s possible to buy slots of a dedicated instance?
I’ve seen that for example a gpt3.5 instance is made of 100 compute slots, what if I just need 1-2 of them instead of all 100?