Use new model for realtime audio transcription

Hi Fabrizio, I think our error is that we aren’t nesting the configuration inside the session object.
If you look at the documentation, the transcription_session.update is as follows:

{
  "type": "transcription_session.update",
  "session": {
    "input_audio_format": "pcm16",
    "input_audio_transcription": {
      "model": "gpt-4o-transcribe",
      "prompt": "",
      "language": ""
    },
    "turn_detection": {
      "type": "server_vad",
      "threshold": 0.5,
      "prefix_padding_ms": 300,
      "silence_duration_ms": 500,
      "create_response": true,
    },
    "input_audio_noise_reduction": {
      "type": "near_field"
    },
    "include": [
      "item.input_audio_transcription.logprobs",
    ]
  }
}

Note that all the configuration options are within the session object. and also we don’t have to send the session key in every request.
These changes solved the same error I got, and audio is being sent to the API but the API is completely silent except for the speech start event. But that’s unrelated to this likely.
Cheers!

1 Like