Hello everyone,
I am building a conversation bot for myself.
I expect the bot to have almost real-time response like human.
However, on average, it takes 3s on average to generate response, and the response time is unstable, tested on many network.
I wonder if GPT provide a higher tier subscription that give me priority queue API?
Thank you.
Welcome to the forum!
There are no higher tiers for faster inferencing using the normal API endpoints, those using the Azure OpenAI offerings from Microsoft experience improved performance currently, but there is no guarantee that this performance boost will will remain once mass adoption has taken place. The only real way to ensue very low latency and inference speed is to take advantage of a dedicated instance. These are servers configured for your exclusive use, although you will typically need to be using around 450 million tokens per day for this option to make economic sense.
I write a lot slower than the AI does…
There is currently no alternate tier that is public if you are not one of the huge partners.
You can look at Bing Chat, and see the rate of token generation there, which is based on OpenAI services running on the Azure platform. It doesn’t seem to be that different.
One alternative if you just need text slammed on your screen at an incredible rate is Anthropic’s Claude. They have very limited API access though.
Another option is to use a simpler OpenAI model that doesn’t “think” as much when it is generating answers. The babbage
completion engine, for example, will produce text very fast, although with the lower quality of being 1/20th as knowledgeable, and it will have a replacement soon.
Thank you for the response.
Do you have the authority to recommend OpenAI to make a higher tier than GPT Plus offering more benefits to users?
I just saw a topic where users complained about GPT Plus being slow and unstable too, have you guys worked on that?
“us guys” are all just community members; this forum is not regularly staffed or monitored by OpenAI.
Let’s have a go though:
“I recommend OpenAI to make a higher tier”
I wasn’t able to recommend ChatGPT or API above, because ChatGPT “plus” you mention is different than API, the forum category you chose.
I suspect the speed of generation is simply the maximum output that a computing instance can generate with the current state of the art (excluding building $200,000 8xNVidia H100 servers). It is not because you are sharing the same server with 100 other inference tasks at the same time.
Hello this is all new for me but sometimes is confusing for me. This is for fun party or something like computer thing
This is a forum primarily for discussing developing your own AI-based end-user-facing or company products, using OpenAI company’s machine learning language model products on their API.
For talking about funny things an AI might output, random future speculations, or just about optimally conversing with OpenAI’s own ChatGPT consumer product, a platform like Reddit is better.
It could be a “fun party”, but people generally reach the forum when something has gone wrong for them by misunderstanding or never reading documentation, by a broken API service affecting accounts or a wide swath of users, have “feature request” ideas to be filed under “ignore” categories, want to implement ideas out of their programming depth - and leave when there is no free support to fix things or to do their job for them except fellow exasperated users.
Or are posting off-topic, for information an AI can help with:
(on topic to this forum topic is today’s announcement of a $200/mo ChatGPT plan that gives access to an “O1-pro” model.)