Background Noise Interfering with Realtime API Using Phone

shaumikm · January 26, 2025, 4:46am

I’m building a phone bot using Twilio and OpenAI’s Realtime API that allows users to call and speak with an AI assistant about store information. I’m running into some challenges with audio processing and would appreciate any guidance from the community.

Current Implementation

Using Twilio for phone integration
OpenAI Realtime API with GPT-4
Server-side VAD for turn detection
g711_ulaw audio format
Currently implementing voice activity detection with configurable threshold

Issues

Background Noise Sensitivity: The system is picking up ambient voices and background noise, even with high VAD thresholds
Self-feedback Loop: The assistant sometimes picks up its own voice output and responds to it, particularly when users are on speakerphone
Speakerphone Compatibility: These issues are especially pronounced during speakerphone usage, which is a crucial use case for our implementation

What We’ve Tried

Increased VAD threshold settings (up to 0.9) in the session configuration:

turn_detection: {
    type: 'server_vad',
    threshold: 0.9,
    silence_duration_ms: 500,
}

Questions

Are there recommended approaches for handling background noise in phone bot implementations?
Should we implement audio preprocessing/filtering before sending the audio to the API? If so, what methods would you recommend?
Are there any specific best practices for handling speakerphone scenarios?
Are there additional configuration options we should be exploring beyond VAD threshold adjustment?

Any insights or recommendations would be greatly appreciated. Thank you!

Technical Details

Audio Format: g711_ulaw
API Version: realtime-preview-2024-12-17
Implementation: Node.js with WebSocket

j.wischnat · January 29, 2025, 10:52am

You have to implement your own Active Echo Cancellation (AEC) and Active Noise Suppression (ANS) before streaming the Audio to the OpenAI Realtime API.

Cheers!

shaumikm · January 30, 2025, 5:45am

Hey, thanks for replying. Do you have any resources or any sort of examples on how to do implement AEC and ANS? I tried using python noisereduce library on each of the audio delta but I don’t think I’m doing this right.

j.wischnat · January 30, 2025, 8:17am

In python, there are probably a ton of resources on how to do this. As I am mainly using Java I won’t be of much help, but you can always search on the internet or ask ChatGPT!

vaibhav4 · February 11, 2025, 12:38am

@shaumikm Did you ever figure this out?

amir76 · February 12, 2025, 12:22pm

How do you recommend implementing that with Java? we are using Java as well and have a similar problem.

j.wischnat · February 12, 2025, 1:56pm

Well, I use a SIP SDK/Library to route the OpenAI Realtime API Audio through a SIP to the user on an actual phone. It’s called JVoIP but I doubt that that’s what you’re looking for.
JVoIP has AEC and ANC integrated.
There are a bunch of SDKs out there though.

darcschnider · February 12, 2025, 5:39pm

Hey shaumikm, here’s a quick-and-dirty recipe to quiet down those unwanted background noises and that pesky self-feedback:

Mic Mute While Speaking: When your AI is dishing out responses, have the mic temporarily mute so it doesn’t catch its own voice. It’s like giving your bot a moment of silence—no echo party here!
Wake Word Detection: Instead of always listening, set it to only actively process audio when a specific wake word is detected. This way, ambient noise is less likely to trigger false activations.
Preprocess with AEC/ANS: Before streaming to the API, run the audio through some filters. Use something like WebRTC’s built-in Acoustic Echo Cancellation (AEC) and noise suppression, or try libraries like RNNoise or SpeexDSP. These tools can significantly clean up both echo and background noise.
Fine-Tune Your VAD: Adjust your Voice Activity Detection thresholds to ignore brief bursts of background chatter. A few tweaks here and there can mean the difference between capturing clear speech and mistaking a random cough for input

amir76 · February 12, 2025, 6:19pm

I love your first option - mic mute while bot is speaking. but how can we implement it on a regular phone call? remember the user has no software installed but he is speaking with the bot via a regular phone call. how can we force the mic to mute while speaking? I know how to do it using our app but not when Realtime API using phone. Tnx

darcschnider · February 12, 2025, 6:30pm

on the ai serving side you can have it stop taking inputs well it chats. assuming you control that said haha. If you are running VAD etc on the back end you an simple create yourself a flag to track is_speaking to stop inputs until finish.

Bazz0r · February 19, 2025, 7:54pm

is this using websockets or WebRTC? Thanks!

j.wischnat · February 20, 2025, 6:28am

It’s using websockets for the communication with the OpenAI Realtime API.

shaumikm · February 20, 2025, 6:55am

I tried using silero VAD from pipecat
However, due to other considerations I’ve ditched openai all together and decided to use Elevenlab’s conversation agent.
If you are using OpenAI’s realtime API, i’d recommend using pipecat as the framework.

EDIT: Here’s example using realtime api using pipecat with silero vad that i used pipecat/examples/foundational/19-openai-realtime-beta.py at main · pipecat-ai/pipecat · GitHub

bakht.ullah · July 28, 2025, 9:06am

@shaumikm Do you find any solution regarding the problem you were facing? I am facing the same problem.

shaumikm · July 31, 2025, 7:37pm

I stopped using their realtime api altogether. It seems like openai isn’t interested in developing their realtime api. I’m using elevenlabs agent which handles noise themselves.

Topic		Replies	Views
RealtimeAPI audio feedback Feedback gpt-4	9	747	January 30, 2025
Realtime API starts to answer itself with mic+speaker setup API realtime	6	2498	November 22, 2024
How can I improve noise suppression in OpenAI with WebRTC? Bugs realtime , api-realtime	1	329	April 28, 2025
Issue with realtime api user interruption API realtime	6	1935	October 24, 2024
Realtime API - What events should be handled? (e.g. for call centers) API	8	1767	October 16, 2024

Background Noise Interfering with Realtime API Using Phone

Related topics