Realtime API feedback loop

I just tried out /github.com/Azure-Samples/aoai-realtime-audio-sdk/tree/main/dotnet/samples and am really impressed by the quality. However, I seem to have a significant feedback problem using Server based turn detection: the microphone seems to (slightly) pick up a bit of what is being outputted through the speaker, leading to the API reacting to itself. It works well with my headset, but not with the 2 fairly high quality conference speakers I tried (both of which claim to do echo cancelling).

Is this a problem anybody else ran into? I fiddled with the turn detection threshold, but that does not seem to give a clear solution. Is anybody maybe already using an echo-cancelling library?