Realtime transcription messages flow is wrong

6r0m · August 7, 2025, 12:28pm

I thought real-time STT was “broken,” but here’s what’s really happening for me:

Model matters
- whisper-1 is not a streaming model. It sends at most one delta, then completed.
- Use gpt-4o-mini-transcribe or gpt-4o-transcribe if you want live, rolling delta updates.
Commits trigger decoding
- The server only starts transcribing after the input-audio buffer is committed.
  - If you turn on server_vad, the commit happens automatically after a pause (≥ silence_duration_ms).
  - If you talk non-stop, no pause ⇒ no commit ⇒ no deltas until you finally go quiet.
How to get true realtime while speaking continuously
- Keep the transcribe model, but either
  - send input_audio_buffer.commit yourself every 300-500 ms, or
  - turn off VAD (turn_detection.type = "none") and decide when to commit/end_turn in the client.
- Smaller audio chunks (0.25-0.5 s) + periodic commits give deltas within ~0.5 s, with only a tiny accuracy hit.
Deltas are stable
Each delta just adds new tokens; the completed event is only a final “done” marker—nothing gets overwritten.

so now I see that:
After VAD detects a pause, the server sends a started event, then streams delta messages, and finally emits a single completed event.

{
  "input_audio_format": "pcm16",

  "input_audio_transcription": {
    "model": "gpt-4o-transcribe",   // or "gpt-4o-mini-transcribe"
    "language": "en",
    "prompt": "Transcribe the incoming audio in real time."
  },

  "turn_detection": {
    "type": "server_vad",
    "threshold": 0.05,              // RMS gate (0 = silence … 1 = loud)
    "prefix_padding_ms": 300,       // keeps first syllable from being clipped
    "silence_duration_ms": 50       // VAD fires after 50 ms of quiet
  },

  "client_chunk_size": 32000        // bytes per append (~1 sec at 16-kHz PCM)
}

Topic		Replies	Views
GPT-4o-transcribe realtime, the .delta updates not received during the transcription API transcribe	7	565	February 10, 2026
Realtime streaming transcription API api-realtime	4	500	February 23, 2026
Semantic VAD might not be working with transcription mode API	11	1250	March 31, 2026
Implementing gpt-realtime and gpt4-4o-transcribe for a streaming transcription API streaming , transcribe , gpt-realtime	9	1508	September 15, 2025
Realtime/transcription_sessions API returns 401 even when adding ephemeral key API transcribe , realtime	8	697	April 7, 2025

Realtime transcription messages flow is wrong

Related topics