Review your own rate limits.
See where the models deployed have nonsensical “default” values:
Compared to what should be provisioned at according to model documentation:
The first thing you would have to do, like in the case of this tier-5 organization, is tell OpenAI, “hey, 0.25 million is not 30 million”!
Then calculate. (You don’t even indicate what model or modality you are considering).
Make some test API calls see the token consumption of an interactive session.
The rate limiter will kick in when it has a chance to inspect the tokens of a new request. I’m not quite sure if they consider and inspect each “create” trigger in a realtime session for blocking. The realtime audio models have a smaller context window, 32k or 16k, so you can have relatively high costs and recurring token consumption, but not infinite costs per model response.
Consider: worst case is some saying “hello” over and over to a full chat context session, for many of those 32k per minute. At $0.032 per 1K tokens to gpt-realtime and a dollar a response.


