For context, we’re a startup that uses GPT-4 for some initial data workflow configuration and these calls have to be synchronous because they are part of the setup. We’ve tried to just use gpt-3.5-turbo, but the results aren’t nearly as good and reliable as with GPT-4.
During traffic spikes after a newsletter or other promotion, the current rate limits lead to a very disappointing user experience when people want to try out our solution concurrently. So it’s not a constant load, but rather just during peaks.
I’m aware of the rate limit documentation but given that GPT-4 has been out there for quite a while now, I was wondering if the sentence “In its current state, the model is intended for experimentation and prototyping, not high volume production use cases.” is still true.
Is there really no way to get a moderate rate limit increase for critical business processes like this?