I’m looking at the rate limits for the real-time API models, specifically the new transcription models:
https://platform.openai.com/docs/models/gpt-4o-mini-transcribe
https://platform.openai.com/docs/models/gpt-4o-transcribe
They list the Tier 5 limit as 10,000 RPM.
What is considered a request for the real-time API? Is it the number of WebSocket connections? Or is it the number of requests sent over those WebSocket connections?
Assuming we went audio in 300ms chunks to the endpoint, that’d be 200 RPM per user, so the limit would be 50 concurrent users.
Or is the limit 10,000 concurrent users?