Realtime semantic VAD issues

chris22 · September 28, 2025, 12:24am

I love the concept behind the semantic VAD mode, but in practice, it seems to cause the agent to stop talking mid-sentence a lot, even when there is no background noise. Anyone else seeing this?

Macha · September 28, 2025, 12:32am

Hey there and welcome to the community!

How sensitive is your mic? You may not think you hear noise, but if the mic’s input volume is really high, or if it’s close to something emitting noise, it may still be picked up and interpreted by the model or VAD. There are also filters you could programmatically use to filter out the audio before you send it to the API.

I’m using oss models right now, but I have noticed that my microphone is sometimes too good, or my filter is too aggressive, so things are either missed, or too much is picked up. And so far, it is literally just a game of dialing knobs and experimentation to see what works best, both in the environment and with the input device being used. It’s going to be different for everyone.

chris22 · September 28, 2025, 4:25am

Yeah, fair point. It DOES act like it’s hearing a noise. I guess that didn’t seem like the explanation though, because I’ve been using the non-semantic version for a while now, and it’s much less sensitive to interruptions. Plus, in semantic mode, there aren’t any settings for sensitivity, are there?

mcfinley · September 28, 2025, 10:05pm

It is a cool concept but I also fell back to server_vad and did more local control to manage interruptions. Feels like an area to watch and switch when behavior aligns with my use case (elder care smart speaker).

tleyden · November 24, 2025, 2:39pm

@mcfinley I might have to switch back to server_vad as well. Do you have any specific tips that worked for you?

mcfinley · November 24, 2025, 2:55pm

@tleyden the short version is local DSP for VAD… check out OpenWakeWord on github … you can see my implementation in open source on chattyfriend dot com.

tleyden · November 24, 2025, 6:48pm

Oh wow, from your message you said server_vad, but then if I’m understanding correctly you had to abandon that and go to client side VAD? Will check out those links, thanks!

Topic		Replies	Views
Realtime semantic VAD not working API bug , realtime , api-realtime	5	1897	May 21, 2025
Background Noise Interfering with Realtime API Using Phone API realtime	14	6423	July 31, 2025
Performance of VAD when audio contains background noise or music API realtime	1	1184	December 19, 2024
Semantic VAD might not be working with transcription mode API	11	1301	March 31, 2026
RealTime API echo issue when on speaker phone API api-realtime	4	664	December 25, 2025

Realtime semantic VAD issues

Related topics