I’m using OpenAI’s Realtime API for voice conversations and have written Node.js code based on the documentation.
https://platform.openai.com/docs/guides/realtime-conversations
I’m able to receive the generated audio and text from OpenAI through the response.translation_audio.delta
and response.audio_transcript.delta
events.
However, I now want to get the transcription of my own input audio, but I’m not sure how to do that.
I tried listening to the conversation.item_input_audio_transcription.delta
event, but I’m not receiving it in my code.
Here’s the key part of my code — I added the input_audio_transcription
parameter, but it doesn’t seem to have any effect.
const openaiSession: SessionUpdateEvent.Session = {
voice: 'shimmer',
modalities: ['text', 'audio'],
instructions: this.genInstructions(),
model: config.openai.model,
turn_detection: {
type: 'semantic_vad',
eagerness: 'high',
create_response: true,
interrupt_response: true,
},
input_audio_transcription: {
model: "gpt-4o-transcribe",
language: 'en',
prompt: this.genTranscriptionPrompt()
}
}
this.openaiWS.send({
type: 'session.update',
session: openaiSession,
});