The Realtime WebSocket API documentation does not mention how to establish a transcription‑only session. Attempting to use transcription models (whisper-1, gpt-4o-transcribe-latest, etc.) with the WebSocket endpoint like this: wss://api.openai.com/v1/realtime?model=whisper-1 results in errors:
Error: Model “whisper-1” is not supported in realtime mode.
The official docs for WebSockets connection only show example like this: wss://api.openai.com/v1/realtime?model=gpt-realtime
There is no mention of how to connect to transcription mode.
Developers have discovered that ?intent=transcription works, but this parameter is not documented.
Attempting to send session.update to change session type also fails because
Passing a transcription session update event to a realtime session is not allowed
It seems that without intent=transcription it is impossible to establish a realtime transcription session via WebSockets. Documentation should clearly explain how to start a transcription session via WebSocket.