Can't get the user transcription in realtime api

Hi everyone, I am implementing the OpenAI Realtime API and have configured the session to include audio transcription using the following configuration:

input_audio_transcription: {
    model: “whisper-1”

However, the audio input provided by the user does not generate a transcript. Instead, the transcript field always returns null. Below is the response received from the API:

  "type": "conversation.item.created",
  "event_id": "event_AkR2BLE7l9oMUumIva3Ku",
  "previous_item_id": null,
  "item": {
    "id": "item_AkR29UqpepukIR4ioIUYO",
    "object": "realtime.item",
    "type": "message",
    "status": "completed",
    "role": "user",
    "content": [
        "type": "input_audio",
        "transcript": null

so how can I get the user transcript from the Realtime API?

Can someone please help?

1 Like

Have you solved this yet?

You need to add it to your session.update to retrieve. By default, it isn’t included. Here’s an example:



function configureData() {
const event = {
type: ‘session.update’,
session: {
modalities: [‘text’, ‘audio’],
tools: [
{ type: ‘function’, name: ‘functionOne’, description: ‘Function one description’ },
{ type: ‘function’, name: ‘functionTwo’, description: ‘Function two description’ },
{ type: ‘function’, name: ‘functionThree’, description: ‘Function three description’ },
type: ‘function’,
name: ‘functionFour’,
description: ‘Function four description’,
type: ‘function’,
name: ‘functionFive’,
description: ‘Handles text from AI response’,
input_audio_transcription: {
model: ‘whisper-1’,

if (dataChannel && dataChannel.readyState === 'open') {
  console.log('Session update sent.');


**NOTE: You don’t need the functions however, this shows how you would include them

Also, you need to pull the Assistant and User audio/text from the logs and display them in your UI if you want them visually logged for the user.