I am developing a web app that will be used by a large number of users. The web app will be making calls to OpenAI for each user’s interaction. How do I avoid hitting rate limits? Is there a way to increase it or is this a case where I should move to using Azure OpenAI?
If you anticipate usage bursts of more requests or more tokens than your limit for sustained periods of more than a minute or two, there is nothing for you except request a higher limit.
Within a fixed limit, the only thing you can do with your software that is tracking its output rate is to start queueing or start reporting “too busy” if too deep, and also handling the possible API errors you still get by running near the limit.
For actual production you should switch to the OpenAI services offered via Microsoft Azure.
As far as I can tell OpenAI is not trying to be more than the provider of the models powering your app. Instead Microsoft handles the scale and Open AI develops and maintains the models.