I am building a conversation bot for myself.
I expect the bot to have almost real-time response like human.
However, on average, it takes 3s on average to generate response, and the response time is unstable, tested on many network.
I wonder if GPT provide a higher tier subscription that give me priority queue API?
Welcome to the forum!
There are no higher tiers for faster inferencing using the normal API endpoints, those using the Azure OpenAI offerings from Microsoft experience improved performance currently, but there is no guarantee that this performance boost will will remain once mass adoption has taken place. The only real way to ensue very low latency and inference speed is to take advantage of a dedicated instance. These are servers configured for your exclusive use, although you will typically need to be using around 450 million tokens per day for this option to make economic sense.
I write a lot slower than the AI does…
There is currently no alternate tier that is public if you are not one of the huge partners.
You can look at Bing Chat, and see the rate of token generation there, which is based on OpenAI services running on the Azure platform. It doesn’t seem to be that different.
One alternative if you just need text slammed on your screen at an incredible rate is Anthropic’s Claude. They have very limited API access though.
Another option is to use a simpler OpenAI model that doesn’t “think” as much when it is generating answers. The
babbage completion engine, for example, will produce text very fast, although with the lower quality of being 1/20th as knowledgeable, and it will have a replacement soon.
Thank you for the response.
Do you have the authority to recommend OpenAI to make a higher tier than GPT Plus offering more benefits to users?
I just saw a topic where users complained about GPT Plus being slow and unstable too, have you guys worked on that?
“us guys” are all just community members; this forum is not regularly staffed or monitored by OpenAI.
Let’s have a go though:
“I recommend OpenAI to make a higher tier”
I wasn’t able to recommend ChatGPT or API above, because ChatGPT “plus” you mention is different than API, the forum category you chose.
I suspect the speed of generation is simply the maximum output that a computing instance can generate with the current state of the art (excluding building $200,000 8xNVidia H100 servers). It is not because you are sharing the same server with 100 other inference tasks at the same time.