Input_audio_transcription not working in Real-Time — related to g711_ulaw?

I’m trying to access the transcription of my audio input and so I’m using the session.update parameter like the documentation says:

"input_audio_transcription": {
            "enabled": true,

it flashes an Error "did you mean “True”? But when I change it to “True” it also doesn’t work. I’ve read on other posts that leaving out the “enabled” parameter works for some but this isn’t the case for me.

Here’s my question: I noticed that the sound input in the OpenAI example is set to “pcm16”, but because I’m using a Twilio integration, it only works when I use “g711_ulaw”. Could it be a bug that only allows transcription for certain sound formats?

Thanks for your help!

1 Like

I’m not sure if I’m understanding correctly, but you have to pass:

        "input_audio_transcription": {
            "model": "whisper-1"
        },

For me it also does not work with:
“input_audio_transcription”: {
“model”: “whisper-1”
},
in the session.update

I only get the following conversation object after the input is done:
{id: item_AKsJIqYvkxPyzm6acewEU, object: realtime.item, type: message, status: completed, role: user, content: [{type: input_audio, transcript: null}]}

(input_audio_transcript: null)

But I get a correct confirmation of the settings from session.update:
{id: sess_AKsJIj6s6hJHJqa88YSWy, object: realtime.session, model: gpt-4o-realtime-preview-2024-10-01, expires_at: 1729539592, modalities: [text, audio], instructions: Help the user, voice: echo, turn_detection: null, input_audio_format: pcm16, output_audio_format: pcm16, input_audio_transcription: {model: whisper-1}, tool_choice: auto, temperature: 0.6, max_response_output_tokens: inf, tools: }

Output and output transcription is also working. The only thing I cannot get to work is input transcription.

1 Like

Same, here everything seems to work besides that I receive no transcript of the audio.
I also noticed I don’t get any
conversation.item.input_audio_transcription.failed or
conversation.item.input_audio_transcription.completed

1 Like

Yes I wrote the OG post back when the documentation still had a bug telling you to write “enabled: true”

I now use the code that you shared and it still doesn’t provide transcription of the user’s audio

Do you get coherent responses from the AI when you ask it something via audio?
If not, this could mean that your audio is not being submitted right meaning you would get no transcription as well if there is no audio or the audio is converted wrong and isn’t being understood.

Alternatively you could try and write the audio to a file and listen to it afterwards. Maybe you’ll be able to hear an issue with the audio.

Good luck!