Hi everyone, I am implementing the OpenAI Realtime API and have configured the session to include audio transcription using the following configuration:
input_audio_transcription: {
model: “whisper-1”
}
However, the audio input provided by the user does not generate a transcript. Instead, the transcript
field always returns null
. Below is the response received from the API:
{
"type": "conversation.item.created",
"event_id": "event_AkR2BLE7l9oMUumIva3Ku",
"previous_item_id": null,
"item": {
"id": "item_AkR29UqpepukIR4ioIUYO",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "user",
"content": [
{
"type": "input_audio",
"transcript": null
}
]
}
}
so how can I get the user transcript from the Realtime API?
Can someone please help?
1 Like
Have you solved this yet?
You need to add it to your session.update to retrieve. By default, it isn’t included. Here’s an example:
/*****************************************
*****************************************/
function configureData() {
const event = {
type: ‘session.update’,
session: {
modalities: [‘text’, ‘audio’],
tools: [
{ type: ‘function’, name: ‘functionOne’, description: ‘Function one description’ },
{ type: ‘function’, name: ‘functionTwo’, description: ‘Function two description’ },
{ type: ‘function’, name: ‘functionThree’, description: ‘Function three description’ },
{
type: ‘function’,
name: ‘functionFour’,
description: ‘Function four description’,
},
{
type: ‘function’,
name: ‘functionFive’,
description: ‘Handles text from AI response’,
},
],
input_audio_transcription: {
model: ‘whisper-1’,
},
},
};
if (dataChannel && dataChannel.readyState === 'open') {
dataChannel.send(JSON.stringify(event));
console.log('Session update sent.');
}
}
**NOTE: You don’t need the functions however, this shows how you would include them
Also, you need to pull the Assistant and User audio/text from the logs and display them in your UI if you want them visually logged for the user.