Can't get the user transcription in realtime api

Hi everyone,

I just got through this error, and I want to share how to fix this, if you’re getting null transcription as well ! :smiley: long story short, I was doing the initial ‘post’ handshake w/ ephemeral token with an empty body, and also trying to set some configurations in websockets or all kinds of brute forces until I discovered the answer.

{
  "input_audio_format": "pcm16",
  "input_audio_transcription": {
    "model": "gpt-4o-transcribe",
    "prompt": "",
    "language": "en"
  },
  "turn_detection": {
    "type": "server_vad",
    "threshold": 0.5,
    "prefix_padding_ms": 300,
    "silence_duration_ms": 500
  },
  "input_audio_noise_reduction": {
    "type": "near_field"
  },
  "include": [
    "item.input_audio_transcription.logprobs"
  ]
}

before you initiate the realtime transcription session as you’re triggering the initial post request with your ephemeral token, you need to set the body to the configurations of which model etc

if you send an empty body with a post , handing api key just to get ephemeral token etc, you will get this exact issue of receiving “null” on transcription, and potentially lose a day or two, and give up if you don’t have the will of the Highlander.

The API documentation is a total disaster btw

I’ve moved on and found the solution, may others also reach the same blessing by reading this. Aloha.

2 Likes