GPT-4o-Transcribe: Why Does the Final Output Sometimes Exactly Replicate the Configured Prompt?

biohazerimperion · April 8, 2025, 3:00pm

I’m encountering an issue with GPT-4o-Transcribe where, in some cases, the system returns a final output that is exactly the same as the input prompt provided in the configuration. I’m unsure why this happens, and I’d like to understand if this is a bug in the API.

I’ve noticed this behavior occurs more frequently with Spanish text. Is there a known limitation or condition that causes the model to return the unmodified prompt as the transcription result?

Here’s a summary of what I’m seeing:

The final output is identical to the prompt.
This happens intermittently.
It severely affects the real-time transcription experience and makes it unsuitable for production use.
For this test, I used a HyperX QuadCast microphone.

Below I’m including some API event logs that show this behavior, along with my session configuration for reference.

Let me know if there’s a workaround or if this is something the team is already aware of. I’d really appreciate any guidance on how to mitigate or avoid this issue.

Logs:

Received message: {'type': 'input_audio_buffer.speech_started', 'event_id': 'event_BK4ZqecytxeVEAGTNsVMa', 'audio_start_ms': 3796, 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1'}
Received message: {'type': 'input_audio_buffer.speech_stopped', 'event_id': 'event_BK4ZrpiBrdUp2x8NQL0DX', 'audio_end_ms': 4960, 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1'}
Received message: {'type': 'input_audio_buffer.committed', 'event_id': 'event_BK4ZrAfYTpbGAOYL0G2L3', 'previous_item_id': 'item_BK4ZnE1K1NP0YgOYLMJ35', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1'}
Received message: {'type': 'conversation.item.created', 'event_id': 'event_BK4ZrzhIZZBgxsefJPeJR', 'previous_item_id': 'item_BK4ZnE1K1NP0YgOYLMJ35', 'item': {'id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'object': 'realtime.item', 'type': 'message', 'status': 'completed', 'role': 'user', 'content': [{'type': 'input_audio', 'transcript': None}]}}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4Zs1QyLEGUIY8HlWvak', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': 'Esta'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsuVpHgvR60LP5Isvn', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' es'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsDNjA8jfxZlzTCG0C', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' una'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4Zswq1EW5fwM1DsIqw1', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' prueba'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsCTmm8YMZa3Oz27eb', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' para'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsSHnHgYY1M2Gw8cRX', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' mostrar'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsRiXuPJnE9V7NFA2p', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' el'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsUd2inA5KSKjNOJuW', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' bug'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsEEV9g764nTk5FJYr', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' de'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4Zsmk7Pz4n1r3ju19rq', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' la'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsPugz89oEIS1amWCz', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' trans'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4Zs9TbogfkvvqYQd2WL', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': 'cripción'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4Zs0RNnmfrbRq27nfG4', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': '.'}
Received message: {'type': 'conversation.item.input_audio_transcription.completed', 'event_id': 'event_BK4Zs3IjOSOmW2CiDFGOq', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'transcript': 'Esta es una prueba para mostrar el bug de la transcripción.'}

Config session:

    session_config = {
        "type": "transcription_session.update",
        "session": {
            "input_audio_format": "pcm16",
            "input_audio_transcription": {
                "model": "gpt-4o-transcribe",
                "language": "es",
                "prompt": "Esta es una prueba para mostrar el bug de la transcripción.",
            },
            "turn_detection": {
                "type": "server_vad",
                "threshold": 0.5,
                "prefix_padding_ms": 300,
                "silence_duration_ms": 300,
            },
            "input_audio_noise_reduction": {"type": "near_field"},
        },
    }

jefry · April 10, 2025, 7:50am

I’ve similar issue with transcribing Japanese language
I am using audio book from Kokoro-Speech-Dataset

I use book chapter text as the prompt
I run the audio file with VAD (which will cut it into smaller segments)
the audio opening part contain some statement or information that is not in the prompt. and this is where the gpt-4o-transcribe is outputting all the prompt content

adim · April 28, 2025, 11:04am

Experiencing the same issue. Seems to occur mostly when the audio has no speech, even if it has other noise. Not sure how we can easily filter this out before transcribing.
OpenAI Support - what can we do in the API call to work around this behaviour?

Dan4 · April 29, 2025, 4:06pm

Same issue here. GPT-4o-Transcribe is sending transcription events rewriting the prompt exactly. With some prompts it happens every time, with others only occasionally. Maybe this is happening with non-English languages? I’m using Italian.
I didn’t find a workout other than removing the context prompt entirely, which is a shame.

Topic		Replies	Views
RealTime API Transcription errors Bugs realtime	7	1569	January 9, 2025
How to avoid Hallucinations in Whisper transcriptions? API whisper	31	21530	September 25, 2024
Invalid 'input_audio_transcription.prompt': string too long API transcribe , realtime , api-realtime	3	66	April 11, 2025
[Realtime API] Input audio transcription is not showing Bugs realtime	10	2370	April 29, 2025
Inaccurate transcripts on Whisper API chatgpt , api , whisper	0	126	December 27, 2024

GPT-4o-Transcribe: Why Does the Final Output Sometimes Exactly Replicate the Configured Prompt?

Related topics