I’m using Realtime API with Python on Raspberry Pi 4. When I say something as input and as API responds it will pick up it’s own voice as new input and goes into chaotic loop of babbling with itself.
Are there any options to reduce this? I have tried changing the turn_detection settings like:
Echo cancellation seems to be the way but hard to enable.
I have not been able to get it cancel the speaker sound properly yet. Anyone else have Raspberry Pi happily using realtime API with mic+speakers and share the configs?
Something I’ve tried for activating the echo cancelling module:
in /etc/pulse/default.pa:
Also setting the default sample rate in /etc/pulse/daemon.conf to 11025 as my soundcard seems to work better with this. 24000 gave invalid sample rate errors.
default-sample-rate = 11025
I resample realtime API input and output audio 11,025kHz->24kHz->11,025kHz with scipy.signal.resample.
So far no luck with these settings though. Realtime just starts to answer itself
Perhaps you can mute the mic or substitute a 7FFFh (7777h if doing bytes crudely) sample stream while the audio is being played, if you want to discard the interruption ability and hear what you paid for?
Trying to interrupt UX using OpenAI’s VAD input often gets multiple creates as you are confused why it doesn’t pause and you pause yourself.
For responsiveness, one might use your own local “interruption” VAD with the audio bits shifted down 6 or 12dB. A little indicator light at 60fps of voice activity is kind of cool to watch. User can play prerecorded an AI voice loop and adjust the input level control until the indicator is almost off as part of setup, or some auto-learning of that would be possible.
(Amazon hardware echo cancellation is pretty amazing, BTW, hearing “Alexa” over music blasting from non-device speakers)