How can I improve noise suppression in OpenAI with WebRTC?

I use OpenAI with WebRTC, set up with these options:

  • Turn Detection:
    turn_detection: {
        type: "semantic_vad",
        eagerness: "high",
        create_response: true, // Only in conversation mode
        interrupt_response: true // Only in conversation mode
    }
    
  • Audio Settings:
    We use WebRTC’s built-in noise suppression and echo cancellation:
    audio: {
        noiseSuppression: true,
        echoCancellation: true
    }
    

Even with these settings, the system still picks up other people’s voices in noisy places. To fix this, we added input_audio_noise_reduction and tried both near_field and far_field types.

But when we added this, the bot sometimes freezes for a while, which is really annoying.

Do you have any ideas on how to improve noise suppression or stop the freezing issue? How can we better handle noise in these situations?

1 Like

A picture can tell a thousand words, they say. In this case, a thousand words can deliver a picture…

What you describe is a need for your own noise rejection - a noise gate that would only pass first-person speech levels. It also would need some heuristics so that it maintains recording for periods of muting during the speaker’s thinking pauses.

I show the UX setup of such, presented to a user to do their own environment tuning, perhaps a wizard they are forced through before initial use.

You can then move from streaming everything directly to the endpoint with WebRTC, completely out of your control, to having triggered “create” based on your own VAD and noise audio-buffer algorithm in a backend.

“Bot freezes” - this is more about the implementation, or providing “busy” indicators to show that the UX isn’t actually frozen.