as mentioned here, you can leave out the âenabledâ key which resolved it for some users. For me however, this didnât work but maybe you will have more luck. I really need the transcription as well.
same problem âtranscriptâ is empty. (tested with webRTC the 14/02/2025)
I have put in the session cretation:
âinput_audio_transcriptionâ: {
âmodelâ: âwhisper-1â,
âlanguageâ: âfrâ
},
I have: in the message âconversation.item.createdâ
Ihave the same problem âtranscriptâ is empty. (tested with webRTC the 14/02/2025)
I have put in the session cretation:
âinput_audio_transcriptionâ: {
âmodelâ: âwhisper-1â,
âlanguageâ: âfrâ
},
I have: in the message âconversation.item.createdâ
Click on the demo there and check the âTranscribe User Audioâ and talk and youâll see events come back with transcriptions.
A couple things that I have noticed along the way:
Make sure when you do the session request to include the input_audio_transcription field as part of the session request to get the ephemeral token. If you do not do it there, you have to send a separate session.update client event to update it with the transcriptions. This follow up session.update event will work - I use it in the example in fact.
Be careful about background noise. Sometimes the Realtime API will respond to speech, but the transcriptions are wrong or blank because the Whisper-1 model used for transcriptions doesnât interpret the speech the same way that the Realtime API model does.
Iâve only really tried this in English, although the transcriptions are really just from Whisper so anything that works with Whisper should work for transcriptions with the Realtime API too.