Priority queue for prompt execution

Is there any way to get more priority in the completion queue, in my case I don’t need a super fast completion in order to achieve the best user experience for my users, but i need to know the first token as soon as possible.

For example, I need to know the intent of the next action that AI will do in order to introduce that action with a fast reply, I don’t need all the Function JSON Output.

I’ve tried many Azure clouds and OpenAI
UK StartTime: 2753.293562ms
UK EndTime: 6835.470221ms
US StartTime: 2239.90463ms
US EndTime: 6341.402898ms
France StartTime: 1943.414065ms
France EndTime: 5227.289918ms
OpenAI StartTime: 1009.065938ms
OpenAI EndTime: 6070.109997ms

and right now OpenAI seems to be the fastest to start answering but it’s still quite slow (it depends on day and hours) but i would like to know if there is any possibility to have like a priority queue where the first response is granted in something like 300-500ms max

Have you enabled streaming? (stream=true)that should get your first token back in a few hundred milliseconds.

Yes I did, and sometimes the queue makes the generation starts on 400ms and other times on 1.5s or even 3-4s.

For this reason I would like to know if there is a way to have a more predictable inference start time.

I’ve only done a limited amount of testing with my streaming bot setup, it’s always been sub 1000ms. Time of day will certainly have an effect, so will system load. If you wish to have a dedicated instance to ensure you are the only person using the endpoint then you can apply for one via the sales contact page here

I’ve tried multiple times in multiple days and during the day it starts from very fast (400ms sometimes) and goes to very slow (4s and more of waiting without any token coming out).
I’ll try to contact via the sales page, thank you

Still no responses from sales team… just for knowledge, do you know if it’s possible to buy slots of a dedicated instance?
I’ve seen that for example a gpt3.5 instance is made of 100 compute slots, what if I just need 1-2 of them instead of all 100?

At this stage, reselling of dedicated instances is not a standard option, it may be going on behind closed doors between large vendors and clients, but as a standard option, no.

I am sure this will become a burgeoning business in itself as time progresses.

Thanks, I understand, hope that they will bring it out very soon as an option or just answer to my email for more information.