Realtime API Server turn detection limitations (Suggestion & Help Request)

NewSoftware · October 9, 2024, 10:57am

Here’s a potential solution.

Disable server VAD
Use browser client-side VAD — https://www.vad.ricky0123.com/
Implement your own VAD logic with the minimum & maximum silence duration
For the likelihood detection, you would probably need to run a separate STT service (e.g. Whisper) and prompt it to smaller and faster models like gpt4o-mini to detect (you can also go as far as training / fine-tuning a much smaller model for that)

Topic		Replies	Views
Realtime semantic VAD not working API bug , realtime , api-realtime	1	382	March 26, 2025
How to overcome latency in response API gpt-4 , chatgpt	3	2760	February 19, 2024
Silence Detection VAD - pretty neat in Realtime API but very sensitive at times API	1	661	February 5, 2025
Bad output when turn detection is not capturing complete thoughts API api-realtime	0	160	February 15, 2025
Realtime API interruptions are far too sensitive even at a high VAD threshold value Bugs realtime , api-realtime , api-realtime-speech	1	551	January 24, 2025