Realtime API starts to answer itself with mic+speaker setup

tlaukkanen · October 13, 2024, 7:40pm

I’m using Realtime API with Python on Raspberry Pi 4. When I say something as input and as API responds it will pick up it’s own voice as new input and goes into chaotic loop of babbling with itself.

Are there any options to reduce this? I have tried changing the turn_detection settings like:

turn_detection=ServerVAD(type="server_vad", threshold=0.5, prefix_padding_ms=200, silence_duration_ms=200),

and

turn_detection=ServerVAD(type="server_vad", threshold=0.8, prefix_padding_ms=1000, silence_duration_ms=2000),

But options don’t seem to have much effect.

Could it be that Raspberry Pi is somewhat slower and causes enough delay for playback so it doesn’t detect that as it’s own speech?

_j · October 13, 2024, 7:43pm

Your search terms:

“pulseaudio echo cancellation module Raspberry Pi”

FrankFu · October 16, 2024, 2:21pm

I got the similar issue on iOS. There are few libraries we could use on the iOS. I found one of them works good.

You can find my code in github account fuwei007/OpenAIIOSRealtimeAPIDemo

tlaukkanen · October 16, 2024, 5:36pm

Echo cancellation seems to be the way but hard to enable.

I have not been able to get it cancel the speaker sound properly yet. Anyone else have Raspberry Pi happily using realtime API with mic+speakers and share the configs?

Something I’ve tried for activating the echo cancelling module:
in /etc/pulse/default.pa:

load-module module-echo-cancel rate=11025 aec_method=webrtc source_name=aec_source source_properties=device.description=aec_source sink_name=aec_sink sink_properties=device.description=aec_sink
set-default-source aec_source
set-default-sink aec_sink

Also setting the default sample rate in /etc/pulse/daemon.conf to 11025 as my soundcard seems to work better with this. 24000 gave invalid sample rate errors.

default-sample-rate = 11025

I resample realtime API input and output audio 11,025kHz->24kHz->11,025kHz with scipy.signal.resample.

So far no luck with these settings though. Realtime just starts to answer itself

_j · October 17, 2024, 9:21pm

Perhaps you can mute the mic or substitute a 7FFFh (7777h if doing bytes crudely) sample stream while the audio is being played, if you want to discard the interruption ability and hear what you paid for?

Trying to interrupt UX using OpenAI’s VAD input often gets multiple creates as you are confused why it doesn’t pause and you pause yourself.

For responsiveness, one might use your own local “interruption” VAD with the audio bits shifted down 6 or 12dB. A little indicator light at 60fps of voice activity is kind of cool to watch. User can play prerecorded an AI voice loop and adjust the input level control until the indicator is almost off as part of setup, or some auto-learning of that would be possible.

(Amazon hardware echo cancellation is pretty amazing, BTW, hearing “Alexa” over music blasting from non-device speakers)

tlaukkanen · October 18, 2024, 8:31am

Yes, that was what I ended up doing to get a proper conversation to work with the RPi. I’m just muting the input mic stream if there is audio in output stream. Did it sort of like this:

# "Global" indicator when AI spoke last time
ai_last_talk_time = 0

...

# Output audio handling
def play_audio(output_stream: pyaudio.Stream):
    global ai_last_talk_time
    while True:
        audio_data = audio_output_queue.get()

        # Mark that AI is talking, do not pass input audio to audio_input_queue to avoid echo
        ai_last_talk_time = time.Time()
        
        output_stream.write(audio_data)

# Input mic audio handling
def listen_audio(input_stream: pyaudio.Stream):
    global ai_last_talk_time
    while True:
        audio_data = input_stream.read(INPUT_CHUNK_SIZE, exception_on_overflow=False)
        if audio_data is None:
            continue

        # Check if it's been more than 1 second since AI last talked to avoid echo
        if time.time() - ai_last_talk_time < 1:
            print("AI is talking, skipping input audio")
            continue

        base64_audio = base64.b64encode(audio_data).decode("utf-8")
        audio_input_queue.put(base64_audio)

...

braden.sanders · November 22, 2024, 6:48pm

Hi Tommi, I have come across this thread multiple times while I attempt to implement AEC for my realtime API application on my RP5. I was wondering if you ever found an implementation for pulseaudio AEC that works for this API? I am having the same issues as you were trying to get things working, and I would LOVE to be able to use my voice assistant without headphones. Let me know!

Topic		Replies	Views
Issue with realtime api user interruption API realtime	6	1455	October 24, 2024
Background Noise Interfering with Realtime API Using Phone API realtime	12	649	February 20, 2025
RealtimeAPI audio feedback Feedback gpt-4	9	316	January 30, 2025
Need help being able to interrupt the Realtime API response API realtime	16	3726	February 19, 2025
Real-Time Model is hearing and talking to itself in a loop API	8	553	February 25, 2025

Realtime API starts to answer itself with mic+speaker setup

Related topics