What are considered "requests" when using the real-time API?

thomasb · March 27, 2025, 5:58am

I’m looking at the rate limits for the real-time API models, specifically the new transcription models:

https://platform.openai.com/docs/models/gpt-4o-mini-transcribe
https://platform.openai.com/docs/models/gpt-4o-transcribe

They list the Tier 5 limit as 10,000 RPM.

What is considered a request for the real-time API? Is it the number of WebSocket connections? Or is it the number of requests sent over those WebSocket connections?

Assuming we went audio in 300ms chunks to the endpoint, that’d be 200 RPM per user, so the limit would be 50 concurrent users.

Or is the limit 10,000 concurrent users?

cargt3 · May 13, 2026, 4:14pm

Hi @thomasb
Did you find answer?

_j · May 13, 2026, 4:39pm

Despite the model being “realtime”, that is just a buffering front end.

The model generates based on a set input context and a trigger. The trigger is either the end of server voice activity detection, or your response.create event. That is your request to receive a generated output, and what the rate limiter would error out on if over the limiter quota.

Topic		Replies	Views
How are requests counted when using the realtime API API api-realtime	1	114	June 23, 2025
What's the maximum number of concurrent requests allowed? API	1	1665	December 31, 2024
Realtime API "rate_limit_exceeded" "We're currently processing too many requests — please try again later." API realtime	8	1783	October 11, 2024
Realtime API updates — WebRTC, cheaper prices, 4o-mini, and more Announcements	24	8880	December 27, 2024
Realtime API - WebRTC, randomly receiving no response API	5	823	December 30, 2024

What are considered "requests" when using the real-time API?

Related topics