Transcribing live call (starting stream on Twilio call), after 1-4 min openAI starts to degrade

lkuznecoff · May 16, 2025, 3:13pm

Hello. We are experiencing significant performance degradation when using the OpenAI real-time transcription API for call transcriptions.

The issue consistently occurs after 1 to 4 minutes of continuous streaming:

Transcriptions start returning with noticeable delays relative to the real-time audio.
The latency between spoken sentences and transcription output increases gradually over time.
In some cases, the system returns transcriptions for entire segments with a large delay (several seconds after the actual speech).
The transcription quality also seems to degrade, with incomplete or contextually incorrect transcriptions appearing after extended usage.
This happens regardless of the input audio format (g711_ulaw, sample rate 8000 Hz) and despite the correct configuration of turn_detection (using semantic_vad).

We’ve confirmed that our WebSocket streaming pipeline maintains real-time audio delivery without backlog, and network conditions remain stable throughout the sessions.

Expected Behavior:

Real-time transcription output should remain responsive and consistent throughout the entire session, without noticeable latency increase or degradation in accuracy after several minutes of continuous streaming.

Please advise if there are any session duration limitations, internal buffering mechanisms, or recommended practices (such as session renewal strategies) that could mitigate this issue.

Topic		Replies	Views
Realtime API - Message being cutoff followed by silence Bugs	1	591	January 13, 2025
Realtime API with noise_reduction has sudden increase of latency Bugs realtime	4	2311	May 12, 2025
[BUG] Realtime API - Transcript returns other users' data and internal tokens (gpt-realtime-2025-08-28) Bugs api , api-realtime	7	241	December 22, 2025
Progressive latency increase in long gpt-realtime-1.5 (+gpt-realtime) voice sessions, including non-tool turns, not fully reset by conversation item deletion API api-realtime	1	390	March 24, 2026
Realtime api transcribe gibberish API api-realtime , api-realtime-speech	19	978	December 10, 2025

Transcribing live call (starting stream on Twilio call), after 1-4 min openAI starts to degrade

Related topics