Content transcript is null with Retrieve event

lenduya · May 5, 2025, 1:41am

I’m trying to have a conversation with Realtime API. I record a message and send it, but I also want my audio message to show as a message I send - in other words, I want my recording transcribed.

What I’ve done so far:

Establish connection with WebSocket. I receive this as a response:

{
    "type": "session.created",
    "event_id": "event_BTepLqYzeQG3H50OX80NX",
    "session": {
        "id": "sess_BTepLv2jIuFPxvBlhamPW",
        "object": "realtime.session",
        "expires_at": 1746409951,
        "input_audio_noise_reduction": null,
        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.5,
            "prefix_padding_ms": 300,
            "silence_duration_ms": 200,
            "create_response": true,
            "interrupt_response": true
        },
        "input_audio_format": "pcm16",
        "input_audio_transcription": null,
        "client_secret": null,
        "include": null,
        "model": "gpt-4o-realtime-preview",
        "modalities": [
            "text",
            "audio"
        ],
        "instructions": "[redacted]",
        "voice": "alloy",
        "output_audio_format": "pcm16",
        "tool_choice": "auto",
        "temperature": 0.8,
        "max_response_output_tokens": "inf",
        "tools": []
    }
}

input_audio_format is null and I want to fix that, so I update the session using this:

{
  "type" : "session.update",
  "session" : {
    "input_audio_transcription" : {
      "language" : "en",
      "model" : "whisper-1",
      "prompt" : "Use a British accent."
    }
  }
}

I receive a response back:

{
    "type": "session.updated",
    "event_id": "event_BTepQRdq8siJYUlUsSyL8",
    "session": {
        "id": "sess_BTepLv2jIuFPxvBlhamPW",
        "object": "realtime.session",
        "expires_at": 1746409951,
        "input_audio_noise_reduction": null,
        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.5,
            "prefix_padding_ms": 300,
            "silence_duration_ms": 200,
            "create_response": true,
            "interrupt_response": true
        },
        "input_audio_format": "pcm16",
        "input_audio_transcription": {
            "model": "whisper-1",
            "language": "en",
            "prompt": "Use a British accent."
        },
        "client_secret": null,
        "include": null,
        "model": "gpt-4o-realtime-preview",
        "modalities": [
            "text",
            "audio"
        ],
        "instructions": "[redacted]",
        "voice": "alloy",
        "output_audio_format": "pcm16",
        "tool_choice": "auto",
        "temperature": 0.8,
        "max_response_output_tokens": "inf",
        "tools": []
    }

I send the audio and receive this response:

{
  "type" : "conversation.item.create",
  "event_id" : "1746408181",
  "item" : {
    "type" : "message",
    "role" : "user",
    "content" : [ {
      "type" : "input_audio",
      "audio" : "UklGRjy+AgBXQVZFZm10IBAAAAABAAEARKwAAIhYAQACABAAZGF0YRi+AgD3//j..."
   } ]
  }
}

The item is created:

{
    "type": "conversation.item.created",
    "event_id": "event_BTepwDIFQNG6ouL9weJro",
    "previous_item_id": null,
    "item": {
        "id": "item_BTepwDRLv35FqFspbIa27",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "user",
        "content": [
            {
                "type": "input_audio",
                "transcript": null
            }
        ]
    }
}

I want to have a transcript, so I call retrieve with the item_id from the previous response and receive this:

{
    "type": "conversation.item.retrieved",
    "event_id": "event_BTercp18HeG8nWHDcYeiF",
    "item": {
        "id": "item_BTepwDRLv35FqFspbIa27",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "user",
        "content": [
            {
                "type": "input_audio",
                "transcript": null,
                "audio": "UklGRjy+AgBXQVZFZm10IBAAAAABAAEARKwAAIhYAQACABAAZGF0YRi+AgD3..."
                "format": "pcm16"
            }
        ]
    }
}

The transcript is null after both the conversation.item.created and conversation.item.retrieved. I expected the transcript to have the transcript of the audio I sent.
Am I misunderstanding? How do I retrieve the audio transcript?

Topic		Replies	Views
Missing input audio transcription API api-realtime	4	63	April 30, 2025
[Realtime API] Input audio transcription is not showing Bugs realtime	10	2452	April 29, 2025
Can't get the user transcription in realtime api API transcribe , realtime	6	1266	April 2, 2025
Unable to Access User Audio Transcript in Realtime API API api-realtime	5	1404	February 10, 2025
Retrieving user response from Realtime Voice WebRTC API api	14	603	January 11, 2025

Content transcript is null with Retrieve event

Related topics