Missing Documentation for WebSocket Realtime Transcription Mode

The Realtime WebSocket API documentation does not mention how to establish a transcription‑only session. Attempting to use transcription models (whisper-1, gpt-4o-transcribe-latest, etc.) with the WebSocket endpoint like this: wss://api.openai.com/v1/realtime?model=whisper-1 results in errors:

Error: Model “whisper-1” is not supported in realtime mode.

The official docs for WebSockets connection only show example like this: wss://api.openai.com/v1/realtime?model=gpt-realtime

There is no mention of how to connect to transcription mode.

Developers have discovered that ?intent=transcription works, but this parameter is not documented.

Attempting to send session.update to change session type also fails because

Passing a transcription session update event to a realtime session is not allowed

It seems that without intent=transcription it is impossible to establish a realtime transcription session via WebSockets. Documentation should clearly explain how to start a transcription session via WebSocket.

1 Like