Optimizing Chat GPT 4.0 Enterprise Edition for a Social App with 300K-400K Daily Active Users

be.u.hero · July 31, 2023, 3:11pm

Hello everyone,

I am a product manager of a social application, and recently, I am planning to introduce Chat GPT Enterprise Edition as a virtual chatbot in our app. Our app has around 300,000-400,000 active users who use it extensively on a daily basis. We anticipate generating around 10,000-15,000 GPT API requests per minute during peak usage.

However, I have some concerns regarding the current limit of 200 requests per minute for Chat GPT 4.0 Enterprise Edition. Hence, I am seeking the assistance and advice from community administrators to explore the best approach to increase the number of requests per minute. I would like to understand the process of applying to OpenAI for an increase in the number of requests per minute and whether there are any recommended and timely solutions to handle the expected peak request volume.

If anyone has any suggestions, experiences, or knowledge about this matter, I sincerely request your guidance. Our goal is to optimize the performance of the virtual chatbot and provide an excellent user experience to our vast user base.

Thank you very much for your support and help! Looking forward to sharing and discussing with all of you.

anon22939549 · July 31, 2023, 3:47pm

There’s no such thing as ChatGPT Enterprise Edition.
What you’re describing would likely be on the order of at least $200 / minute in API use. Assuming 10,000 requests per minute at 400 tokens per request at $0.05 per token (if it’s 15,000, 800-token calls it’s $600 / minute).
The standard use limit is $120 / month, that goes up to $500 / month with a good billing history, submitting a request, and waiting several weeks. The largest usage limit I’ve seen is on the order of a few-thousand-dollars per month.

My point here is, that even if you got them to lift your API call rate by 50x–75x what it currently is, you would blow through the standard monthly billing limit in about 36 seconds (or as little as 12 seconds).

By my rough estimation you’re looking for a minimum of $400,000–$500,000 / month of usage (assuming 60 minutes / day of use) but easily up to $1.5M (and possibly much, much more when including off-peak use).

Microsoft Azure is where you need to go if you want to deploy essentially unlimited GPT-4.

curt.kennedy · July 31, 2023, 9:47pm

With that much volume, check into the dedicated instances:

Dedicated instances

We are also now offering dedicated instances for users who want deeper control over the specific model version and system performance. By default, requests are run on compute infrastructure shared with other users, who pay per request. Our API runs on Azure, and with dedicated instances, developers will pay by time period for an allocation of compute infrastructure that’s reserved for serving their requests.

Developers get full control over the instance’s load (higher load improves throughput but makes each request slower), the option to enable features such as longer context limits, and the ability to pin the model snapshot.

Dedicated instances can make economic sense for developers running beyond ~450M tokens per day. Additionally, it enables directly optimizing a developer’s workload against hardware performance, which can dramatically reduce costs relative to shared infrastructure. For dedicated instance inquiries, contact us.

Source link:

nossonweissman1 · August 1, 2023, 12:54am

May I suggest starting off conversations using GPT-4 and then switching to 3.5?

That may actually work. Assuming, most of the requests are not new conversations, you may be able to leverage conversation history for more accurate output. "

You should be able to leverage prompt engineering as well.

On top of this, have you considered something like Stable Beluga 2?

be.u.hero · August 1, 2023, 2:13pm

Thank you for your assistance.
Will the cost come down after deploying to Microsoft Azure based on the solution you provided? I have carefully calculated that there will be 300,000 users per day, and with load balancing, the peak requests per second will be controlled within 15. Is there a better cost control plan? Thank you very much

system · August 3, 2023, 2:14pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Inquiry About Maximum Rate Limit for GPT-3.5-turbo-16k Model API api-rate-increase , rate-limit	7	1073	November 1, 2023
Overcoming Unresponsive Support and Rate Limit Issues API chatgpt , whisper , api-rate-increase , rate-limit	5	2156	March 11, 2024
Enterprise usage of ChatGPT 4o API vs developer usage API api-costs	7	588	June 20, 2024
"What is the Allowed Limit for API Requests in My Custom Application?" API gpt-4 , chatgpt , api	3	11997	December 22, 2023
How to Increase GPT-4o Mini Usage Limits (TPM/TPD)? API chatgpt , api , gpt-4o-mini	3	2527	January 23, 2025

Optimizing Chat GPT 4.0 Enterprise Edition for a Social App with 300K-400K Daily Active Users

Dedicated instances

Related topics