[Realtime API] Input audio transcription is not showing

Hi,

After Websocket initialization i update the session and i have this response:

{
    "type": "session.updated",
    "event_id": "xxx",
    "session": {
        "id": "xxx",
        "object": "realtime.session",
        "model": "gpt-4o-realtime-preview-2024-10-01",
        "expires_at": 1728374700,
        "modalities": [
            "text",
            "audio"
        ],
        "instructions": "...",
        "voice": "shimmer",
        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.5,
            "prefix_padding_ms": 300,
            "silence_duration_ms": 500
        },
        "input_audio_format": "pcm16",
        "output_audio_format": "pcm16",
        "input_audio_transcription": {
            "model": "whisper-1"
        },
        "tool_choice": "auto",
        "temperature": 0.8,
        "max_response_output_tokens": "inf",
        "tools": []
    }
}

and then i send my audio:

{
    'type': 'conversation.item.create',
    'item': {
        'type': 'message',
        'role': 'user',
        'content': [
            {
                'type': 'input_audio',
                'audio': audio_64
            }
        ]
    }
}

in the response i have everything except

conversation.item.input_audio_transcription

Can someone please help?

3 Likes

Same—I’m not seeing the audio transcription either.

Not only am i not seeing the transcription but i get an error message event from openai saying something along the lines of

Invalid parameter session.input_audio_transcription.enabled parameter doesnt exist.

I’ve both typed it in and copy and pasted directly from the docs but no luck…

as mentioned here, you can leave out the “enabled” key which resolved it for some users. For me however, this didn’t work but maybe you will have more luck. I really need the transcription as well.