Use new model for realtime audio transcription

gamunu · April 7, 2025, 6:44am

Hi Fabrizio, I think our error is that we aren’t nesting the configuration inside the session object.
If you look at the documentation, the transcription_session.update is as follows:

{
  "type": "transcription_session.update",
  "session": {
    "input_audio_format": "pcm16",
    "input_audio_transcription": {
      "model": "gpt-4o-transcribe",
      "prompt": "",
      "language": ""
    },
    "turn_detection": {
      "type": "server_vad",
      "threshold": 0.5,
      "prefix_padding_ms": 300,
      "silence_duration_ms": 500,
      "create_response": true,
    },
    "input_audio_noise_reduction": {
      "type": "near_field"
    },
    "include": [
      "item.input_audio_transcription.logprobs",
    ]
  }
}

Note that all the configuration options are within the session object. and also we don’t have to send the session key in every request.
These changes solved the same error I got, and audio is being sent to the API but the API is completely silent except for the speech start event. But that’s unrelated to this likely.
Cheers!

Topic		Replies	Views
Connecting to the Realtime API API	44	7463	December 14, 2024
Issues with Transcription in Realtime Model Using WebRTC Bugs realtime	15	1121	April 30, 2025
Realtime transcription issue API	21	1263	April 9, 2025
Input_audio_transcription in realtime-api API	5	3136	February 20, 2025
Python integration of real time? API	13	3502	October 5, 2024

Use new model for realtime audio transcription

Related topics