GPT-4o-Transcribe: Why Does the Final Output Sometimes Exactly Replicate the Configured Prompt?

I’m encountering an issue with GPT-4o-Transcribe where, in some cases, the system returns a final output that is exactly the same as the input prompt provided in the configuration. I’m unsure why this happens, and I’d like to understand if this is a bug in the API.

I’ve noticed this behavior occurs more frequently with Spanish text. Is there a known limitation or condition that causes the model to return the unmodified prompt as the transcription result?

Here’s a summary of what I’m seeing:

  • The final output is identical to the prompt.
  • This happens intermittently.
  • It severely affects the real-time transcription experience and makes it unsuitable for production use.
  • For this test, I used a HyperX QuadCast microphone.

Below I’m including some API event logs that show this behavior, along with my session configuration for reference.

Let me know if there’s a workaround or if this is something the team is already aware of. I’d really appreciate any guidance on how to mitigate or avoid this issue. :alien_monster:


Logs:

Received message: {'type': 'input_audio_buffer.speech_started', 'event_id': 'event_BK4ZqecytxeVEAGTNsVMa', 'audio_start_ms': 3796, 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1'}
Received message: {'type': 'input_audio_buffer.speech_stopped', 'event_id': 'event_BK4ZrpiBrdUp2x8NQL0DX', 'audio_end_ms': 4960, 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1'}
Received message: {'type': 'input_audio_buffer.committed', 'event_id': 'event_BK4ZrAfYTpbGAOYL0G2L3', 'previous_item_id': 'item_BK4ZnE1K1NP0YgOYLMJ35', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1'}
Received message: {'type': 'conversation.item.created', 'event_id': 'event_BK4ZrzhIZZBgxsefJPeJR', 'previous_item_id': 'item_BK4ZnE1K1NP0YgOYLMJ35', 'item': {'id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'object': 'realtime.item', 'type': 'message', 'status': 'completed', 'role': 'user', 'content': [{'type': 'input_audio', 'transcript': None}]}}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4Zs1QyLEGUIY8HlWvak', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': 'Esta'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsuVpHgvR60LP5Isvn', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' es'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsDNjA8jfxZlzTCG0C', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' una'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4Zswq1EW5fwM1DsIqw1', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' prueba'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsCTmm8YMZa3Oz27eb', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' para'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsSHnHgYY1M2Gw8cRX', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' mostrar'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsRiXuPJnE9V7NFA2p', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' el'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsUd2inA5KSKjNOJuW', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' bug'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsEEV9g764nTk5FJYr', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' de'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4Zsmk7Pz4n1r3ju19rq', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' la'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsPugz89oEIS1amWCz', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' trans'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4Zs9TbogfkvvqYQd2WL', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': 'cripción'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4Zs0RNnmfrbRq27nfG4', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': '.'}
Received message: {'type': 'conversation.item.input_audio_transcription.completed', 'event_id': 'event_BK4Zs3IjOSOmW2CiDFGOq', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'transcript': 'Esta es una prueba para mostrar el bug de la transcripción.'}

Config session:

    session_config = {
        "type": "transcription_session.update",
        "session": {
            "input_audio_format": "pcm16",
            "input_audio_transcription": {
                "model": "gpt-4o-transcribe",
                "language": "es",
                "prompt": "Esta es una prueba para mostrar el bug de la transcripción.",
            },
            "turn_detection": {
                "type": "server_vad",
                "threshold": 0.5,
                "prefix_padding_ms": 300,
                "silence_duration_ms": 300,
            },
            "input_audio_noise_reduction": {"type": "near_field"},
        },
    }
2 Likes

I’ve similar issue with transcribing Japanese language
I am using audio book from Kokoro-Speech-Dataset

I use book chapter text as the prompt
I run the audio file with VAD (which will cut it into smaller segments)
the audio opening part contain some statement or information that is not in the prompt. and this is where the gpt-4o-transcribe is outputting all the prompt content

3 Likes

Experiencing the same issue. Seems to occur mostly when the audio has no speech, even if it has other noise. Not sure how we can easily filter this out before transcribing.
OpenAI Support - what can we do in the API call to work around this behaviour?

1 Like

Same issue here. GPT-4o-Transcribe is sending transcription events rewriting the prompt exactly. With some prompts it happens every time, with others only occasionally. Maybe this is happening with non-English languages? I’m using Italian.
I didn’t find a workout other than removing the context prompt entirely, which is a shame.