I think it is impossible to keep the AI from being interrupted when the phone calling in is on speaker phone. The intro starts when a caller phones in, and inevitably everyone tests with their speaker phone and it picks itself up and starts cutting out etc. as its hearing itself/echo.
Has anyone figured this out? My use case is handling incoming service phone calls. So you can’t control the callers device at all.
Using twilio stream, node and realtime api
1 Like
You can change VAD setting including Semantic VAD.
1 Like
Hi, thanks, yes I’ve played around with various VAD settings but to no avail. I did find another mention of my exact issue in the forum and he gave up and went with eleven labs. That is probably what I’m going to do as well. dont have the issue, much better voices. Its a drag having got this far but I think the solution needs to be done on OpenAI’s end at least for phone calls where you don’t control the callers phone.
We are trying Pipecat. Will report the results as we get done.
Hi, something I discovered that fixed my particular issue, maybe useful: the echo/interrupt problem only occurred on the initial greeting. I’d start to hear the greeting, it would think its being interrupted and would jump etc.. So I changed the code to ensure the greeting can’t be interrupted and that totally fixed the issue. Kept all the same otherwise.
session: {
modalities: [“text”, “audio”],
instructions: buildInstructions(ctx),
voice: normalizeVoice(ctx.voice),
input_audio_format: “g711_ulaw”,
output_audio_format: “g711_ulaw”,
turn_detection: {
type: “server_vad”,
threshold:.85,
prefix_padding_ms: 300,
silence_duration_ms: 500,
create_response: false,
interrupt_response: false,
},
1 Like