Can't get the user transcription in realtime api

hexorbusa · December 31, 2024, 8:11am

Hi everyone, I am implementing the OpenAI Realtime API and have configured the session to include audio transcription using the following configuration:

input_audio_transcription: {
    model: “whisper-1”
}

However, the audio input provided by the user does not generate a transcript. Instead, the transcript field always returns null. Below is the response received from the API:

{
  "type": "conversation.item.created",
  "event_id": "event_AkR2BLE7l9oMUumIva3Ku",
  "previous_item_id": null,
  "item": {
    "id": "item_AkR29UqpepukIR4ioIUYO",
    "object": "realtime.item",
    "type": "message",
    "status": "completed",
    "role": "user",
    "content": [
      {
        "type": "input_audio",
        "transcript": null
      }
    ]
  }
}

so how can I get the user transcript from the Realtime API?

Can someone please help?

QuickSilver · January 11, 2025, 3:10pm

Have you solved this yet?

QuickSilver · January 11, 2025, 3:23pm

You need to add it to your session.update to retrieve. By default, it isn’t included. Here’s an example:

/*****************************************

```
CONFIGURE DATA FOR DATA CHANNEL   *
```

*****************************************/
function configureData() {
const event = {
type: ‘session.update’,
session: {
modalities: [‘text’, ‘audio’],
tools: [
{ type: ‘function’, name: ‘functionOne’, description: ‘Function one description’ },
{ type: ‘function’, name: ‘functionTwo’, description: ‘Function two description’ },
{ type: ‘function’, name: ‘functionThree’, description: ‘Function three description’ },
{
type: ‘function’,
name: ‘functionFour’,
description: ‘Function four description’,
},
{
type: ‘function’,
name: ‘functionFive’,
description: ‘Handles text from AI response’,
},
],
input_audio_transcription: {
model: ‘whisper-1’,
},
},
};

if (dataChannel && dataChannel.readyState === 'open') {
  dataChannel.send(JSON.stringify(event));
  console.log('Session update sent.');
}

}

**NOTE: You don’t need the functions however, this shows how you would include them

Also, you need to pull the Assistant and User audio/text from the logs and display them in your UI if you want them visually logged for the user.

patrick.leprince · February 14, 2025, 4:54pm

I dont’t understand. I have created the session with the right message.
After I have the message : “type”: “session.created”, with “input_audio_transcription”: {
“model”: “whisper-1”,
“language”: “fr”,
“prompt”: null
},
But I have:
in the message “type”: "conversation.item.created
“role”: “user”,
“content”: [
{
“type”: “input_audio”,
“transcript”: null
}

sashirestela · February 14, 2025, 5:15pm

The right event handler to get the user transcription is this:

conversation.item.input_audio_transcription.completed

Also, this topic could be relevant:

Topic		Replies	Views
[Realtime API] Input audio transcription is not showing Bugs realtime	9	1776	February 28, 2025
Unable to Access User Audio Transcript in Realtime API API api-realtime	5	1105	February 10, 2025
Can I use Openai Realtime API for Speech-to-Text? API realtime	5	502	January 30, 2025
Retrieving user response from Realtime Voice WebRTC API api	14	528	January 11, 2025
Sharing experiences about Realtime in the backend ☕ API java , realtime	0	441	January 29, 2025

Can't get the user transcription in realtime api

Related topics