I’m encountering an issue with GPT-4o-Transcribe where, in some cases, the system returns a final output that is exactly the same as the input prompt provided in the configuration. I’m unsure why this happens, and I’d like to understand if this is a bug in the API.
I’ve noticed this behavior occurs more frequently with Spanish text. Is there a known limitation or condition that causes the model to return the unmodified prompt as the transcription result?
Here’s a summary of what I’m seeing:
- The final output is identical to the prompt.
- This happens intermittently.
- It severely affects the real-time transcription experience and makes it unsuitable for production use.
- For this test, I used a HyperX QuadCast microphone.
Below I’m including some API event logs that show this behavior, along with my session configuration for reference.
Let me know if there’s a workaround or if this is something the team is already aware of. I’d really appreciate any guidance on how to mitigate or avoid this issue.
Logs:
Received message: {'type': 'input_audio_buffer.speech_started', 'event_id': 'event_BK4ZqecytxeVEAGTNsVMa', 'audio_start_ms': 3796, 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1'}
Received message: {'type': 'input_audio_buffer.speech_stopped', 'event_id': 'event_BK4ZrpiBrdUp2x8NQL0DX', 'audio_end_ms': 4960, 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1'}
Received message: {'type': 'input_audio_buffer.committed', 'event_id': 'event_BK4ZrAfYTpbGAOYL0G2L3', 'previous_item_id': 'item_BK4ZnE1K1NP0YgOYLMJ35', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1'}
Received message: {'type': 'conversation.item.created', 'event_id': 'event_BK4ZrzhIZZBgxsefJPeJR', 'previous_item_id': 'item_BK4ZnE1K1NP0YgOYLMJ35', 'item': {'id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'object': 'realtime.item', 'type': 'message', 'status': 'completed', 'role': 'user', 'content': [{'type': 'input_audio', 'transcript': None}]}}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4Zs1QyLEGUIY8HlWvak', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': 'Esta'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsuVpHgvR60LP5Isvn', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' es'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsDNjA8jfxZlzTCG0C', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' una'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4Zswq1EW5fwM1DsIqw1', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' prueba'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsCTmm8YMZa3Oz27eb', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' para'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsSHnHgYY1M2Gw8cRX', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' mostrar'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsRiXuPJnE9V7NFA2p', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' el'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsUd2inA5KSKjNOJuW', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' bug'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsEEV9g764nTk5FJYr', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' de'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4Zsmk7Pz4n1r3ju19rq', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' la'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4ZsPugz89oEIS1amWCz', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': ' trans'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4Zs9TbogfkvvqYQd2WL', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': 'cripción'}
Received message: {'type': 'conversation.item.input_audio_transcription.delta', 'event_id': 'event_BK4Zs0RNnmfrbRq27nfG4', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'delta': '.'}
Received message: {'type': 'conversation.item.input_audio_transcription.completed', 'event_id': 'event_BK4Zs3IjOSOmW2CiDFGOq', 'item_id': 'item_BK4Zqm01V3DvAZsu6hCt1', 'content_index': 0, 'transcript': 'Esta es una prueba para mostrar el bug de la transcripción.'}
Config session:
session_config = {
"type": "transcription_session.update",
"session": {
"input_audio_format": "pcm16",
"input_audio_transcription": {
"model": "gpt-4o-transcribe",
"language": "es",
"prompt": "Esta es una prueba para mostrar el bug de la transcripción.",
},
"turn_detection": {
"type": "server_vad",
"threshold": 0.5,
"prefix_padding_ms": 300,
"silence_duration_ms": 300,
},
"input_audio_noise_reduction": {"type": "near_field"},
},
}