RealtimeAPI echo cancellation doesn’t work for the first ~10 seconds of session

I am experiencing a weird bug using the RealtimeAPI with WebRTC on iOS. Almost every single session, for the first few seconds the AI will cut itself off after a few words and reply to itself. The strange thing is this stops after one or two responses and then it doesn’t happen again for the rest of the session. Has anyone else experienced this? And for anyone else using webrtc on iOS are you manually setting the media constraints to use echo cancellation and noise suppression or are they on by default?

On a related note I have been trying to use semantic vad and it causes my webrtc connection to fail despite working fine with server vad. Anyone got it working?

Also, I am setting input_audio_noise_reduction to either near or far field but the session created event I get back always has this field set to null.

1 Like

Hi, something I discovered that fixed my particular issue, maybe useful: the echo/interrupt problem only occurred on the initial greeting. I’d start to hear the greeting, it would think its being interrupted and would jump etc.. So I changed the code to ensure the greeting can’t be interrupted and that totally fixed the issue. Kept all the same otherwise.

session: {
modalities: [“text”, “audio”],
instructions: buildInstructions(ctx),
voice: normalizeVoice(ctx.voice),
input_audio_format: “g711_ulaw”,
output_audio_format: “g711_ulaw”,
turn_detection: {
type: “server_vad”,
threshold:.85,
prefix_padding_ms: 300,
silence_duration_ms: 500,
create_response: false,
interrupt_response: false,
},