Input_audio_transcription not working in Real-Time — related to g711_ulaw?

kevin.g.stjohn · October 15, 2024, 4:14pm

I’m trying to access the transcription of my audio input and so I’m using the session.update parameter like the documentation says:

"input_audio_transcription": {
            "enabled": true,

it flashes an Error "did you mean “True”? But when I change it to “True” it also doesn’t work. I’ve read on other posts that leaving out the “enabled” parameter works for some but this isn’t the case for me.

Here’s my question: I noticed that the sound input in the OpenAI example is set to “pcm16”, but because I’m using a Twilio integration, it only works when I use “g711_ulaw”. Could it be a bug that only allows transcription for certain sound formats?

Thanks for your help!

j.wischnat · October 21, 2024, 11:41am

I’m not sure if I’m understanding correctly, but you have to pass:

        "input_audio_transcription": {
            "model": "whisper-1"
        },

michael.gredenberg · October 21, 2024, 7:31pm

For me it also does not work with:
“input_audio_transcription”: {
“model”: “whisper-1”
},
in the session.update

I only get the following conversation object after the input is done:
{id: item_AKsJIqYvkxPyzm6acewEU, object: realtime.item, type: message, status: completed, role: user, content: [{type: input_audio, transcript: null}]}

(input_audio_transcript: null)

But I get a correct confirmation of the settings from session.update:
{id: sess_AKsJIj6s6hJHJqa88YSWy, object: realtime.session, model: gpt-4o-realtime-preview-2024-10-01, expires_at: 1729539592, modalities: [text, audio], instructions: Help the user, voice: echo, turn_detection: null, input_audio_format: pcm16, output_audio_format: pcm16, input_audio_transcription: {model: whisper-1}, tool_choice: auto, temperature: 0.6, max_response_output_tokens: inf, tools: }

Output and output transcription is also working. The only thing I cannot get to work is input transcription.

LukaDf · October 26, 2024, 1:42pm

Same, here everything seems to work besides that I receive no transcript of the audio.
I also noticed I don’t get any
conversation.item.input_audio_transcription.failed or
conversation.item.input_audio_transcription.completed

kevin.g.stjohn · October 27, 2024, 12:18pm

Yes I wrote the OG post back when the documentation still had a bug telling you to write “enabled: true”

I now use the code that you shared and it still doesn’t provide transcription of the user’s audio

j.wischnat · October 28, 2024, 6:37am

Do you get coherent responses from the AI when you ask it something via audio?
If not, this could mean that your audio is not being submitted right meaning you would get no transcription as well if there is no audio or the audio is converted wrong and isn’t being understood.

Alternatively you could try and write the audio to a file and listen to it afterwards. Maybe you’ll be able to hear an issue with the audio.

Good luck!

Topic		Replies	Views
[Realtime API] Input audio transcription is not showing Bugs realtime	3	496	October 10, 2024
Getting no response event for input_audio_transcription in realtime ws API realtime , api-realtime	8	328	November 12, 2024
Input_audio_transcription in realtime-api API	2	126	November 13, 2024
Unable to Access User Audio Transcript in Realtime API API api-realtime	2	114	November 9, 2024
Realtime API: session update doesn't change input audio format Bugs realtime	27	987	November 8, 2024

Input_audio_transcription not working in Real-Time — related to g711_ulaw?

Related topics