I’m trying to access the transcription of my audio input and so I’m using the session.update parameter like the documentation says:
"input_audio_transcription": {
"enabled": true,
it flashes an Error "did you mean “True”? But when I change it to “True” it also doesn’t work. I’ve read on other posts that leaving out the “enabled” parameter works for some but this isn’t the case for me.
Here’s my question: I noticed that the sound input in the OpenAI example is set to “pcm16”, but because I’m using a Twilio integration, it only works when I use “g711_ulaw”. Could it be a bug that only allows transcription for certain sound formats?
For me it also does not work with:
“input_audio_transcription”: {
“model”: “whisper-1”
},
in the session.update
I only get the following conversation object after the input is done:
{id: item_AKsJIqYvkxPyzm6acewEU, object: realtime.item, type: message, status: completed, role: user, content: [{type: input_audio, transcript: null}]}
(input_audio_transcript: null)
But I get a correct confirmation of the settings from session.update:
{id: sess_AKsJIj6s6hJHJqa88YSWy, object: realtime.session, model: gpt-4o-realtime-preview-2024-10-01, expires_at: 1729539592, modalities: [text, audio], instructions: Help the user, voice: echo, turn_detection: null, input_audio_format: pcm16, output_audio_format: pcm16, input_audio_transcription: {model: whisper-1}, tool_choice: auto, temperature: 0.6, max_response_output_tokens: inf, tools: }
Output and output transcription is also working. The only thing I cannot get to work is input transcription.
Same, here everything seems to work besides that I receive no transcript of the audio.
I also noticed I don’t get any
conversation.item.input_audio_transcription.failed or
conversation.item.input_audio_transcription.completed
Do you get coherent responses from the AI when you ask it something via audio?
If not, this could mean that your audio is not being submitted right meaning you would get no transcription as well if there is no audio or the audio is converted wrong and isn’t being understood.
Alternatively you could try and write the audio to a file and listen to it afterwards. Maybe you’ll be able to hear an issue with the audio.