Bad output when turn detection is not capturing complete thoughts

dlite · February 15, 2025, 3:42pm

I am using the realtime api to navigate IVR systems. Generally it’s performing pretty well, but there is one specific case that produces odd results.

I am using the following turn-detection settings:

'turn_detection': {
                    # Default is 500ms
                    'silence_duration_ms': 1000,
                    'type': 'server_vad'
                },

Originally when I used the default silence_duration_ms of 500 ms, the realtime api performed poorly as this had a tendency to create conversation items that captured a small portion of a complete thought (ex: “What is your?”) and the realtime api would often jump in with an incorrect response. Bumping this up to 1000 ms has reduced this.

However, I do see issues when there is close to the 1000 ms threshold. For example:

IVR: Now tell me the member's date of birth including the year (~500 ms pause ... about to say "For example")
OpenAI: Jan 1, 2025
IVR: For example, (interrupted)
OpenAI: Jan 1, 2025
IVR: That's January (interrupted)
OpenAI: Yes

When this is triggered, it seems like the silence_duration_ms is no longer respected and the realtime api is quick to interrupt. This is also where I see the most hallucinations. Sometimes they do not appear in the transcript as well (similar to Creepy bug of Realtime API + Function Calling: Extra Audio Not in Transcription - #8 by tsar).

Increasing silence_duration_ms to more than 1000 ms can triggers errors navigating IVRs as the response latency is too slow. However, as mentioned above, it seems like silence_duration_ms is being ignored when this scenario begins.

I’ve simulated this w/vanilla gpt-4o chat completions and cannot reproduce this behavior … it feels like something related to the turn-detection.

I’m curious if others have come up with ways to prevent the realtime api from aggressively responding and/or if you have run into this as well.

Topic		Replies	Views
Realtime API interruptions are far too sensitive even at a high VAD threshold value Bugs realtime , api-realtime , api-realtime-speech	1	1117	January 24, 2025
Issues with realtime turn-taking Bugs turn-control , realtime , api-realtime	2	143	January 7, 2026
Realtime API Server turn detection limitations (Suggestion & Help Request) API turn-control , realtime	4	5737	October 14, 2024
Realtime API - Message being cutoff followed by silence Bugs	1	529	January 13, 2025
RealTime API interruptions don't properly trim the transcript Bugs realtime , api-realtime	8	1551	December 12, 2025

Bad output when turn detection is not capturing complete thoughts

Related topics