How to get input_audio_transcription when i use openai realtime api

I’m using OpenAI’s Realtime API for voice conversations and have written Node.js code based on the documentation.

https://platform.openai.com/docs/guides/realtime-conversations

I’m able to receive the generated audio and text from OpenAI through the response.translation_audio.delta and response.audio_transcript.delta events.

However, I now want to get the transcription of my own input audio, but I’m not sure how to do that.

I tried listening to the conversation.item_input_audio_transcription.delta event, but I’m not receiving it in my code.

https://platform.openai.com/docs/api-reference/realtime-server-events/conversation/item/input_audio_transcription/delta

Here’s the key part of my code — I added the input_audio_transcription parameter, but it doesn’t seem to have any effect.

const openaiSession: SessionUpdateEvent.Session = {
    voice: 'shimmer',
    modalities: ['text', 'audio'],
    instructions: this.genInstructions(),
    model: config.openai.model,
    turn_detection: {
        type: 'semantic_vad',
        eagerness: 'high',
        create_response: true,
        interrupt_response: true,
    },
    input_audio_transcription: {
        model: "gpt-4o-transcribe",
        language: 'en',
        prompt: this.genTranscriptionPrompt()
    }
}
this.openaiWS.send({
    type: 'session.update',
    session: openaiSession,
});
1 Like

check this issue “cant-get-the-user-transcription-in-realtime-api”, didn’t try the solutions yet. if you can try and let me know.

can you provide a ref for getting the text from the OpenAI audio response?