Issue with realtime api user interruption

zhan1130 · October 5, 2024, 11:35am

I tried to implement a simple realtime api demo with user interruption feature (i.e., like in the app advanced voice mode, user can speak to interrupt gpt4o’s ongoing audio output at anytime). However, the issue is my computer’s microphone will also capture the API’s output voice as take it as user input, and as the result api will constantly interrupt itself. I wonder how the advanced voice mode solve this issue.

aaron.lutz · October 6, 2024, 2:22pm

Hey there!

I am having the same issue. I ported the real-time API over to python (since my whole code before was in Python, not JS). If you are using web technologies like JS, you could use webrtc for AEC (acoustic echo cancellation). This is harder to do in python. If any one figures out how to handle this, please comment.

stippi · October 15, 2024, 9:12am

Hm. The OpenAI demo application as well their Playground Console both run in the browser and have the same issue for me. Maybe it depends on the machine and the browser, but so far I had this issue in Chrome and Safari. I’m on an M3 MacBook Pro which should have a decent mic array.

zhan1130 · October 15, 2024, 10:24am

Just tried on their playground. No such issue for me there. The issue only arises for my own implementation.

aaron.lutz · October 16, 2024, 7:43am

Yeah, for me it’s the same. In the playground it works perfectly. As I said they most likely use some form of AEC for their web implementation (playground and also ChatGPT App). In the API we don’t have much control over this, we can only set the audio level threshold, which might help in some cases. For most cases however, I suppose we’ll have to implement some sort of AEC on the client side ourselves.

mysticflicker · October 24, 2024, 7:41pm

Also having the same issue, where the device microphone picks up the output from OpenAI and it keep interrupting itself.
Wondering if this is solved, or basically can’t use the realtime without headphones.

aaron.lutz · October 24, 2024, 8:36pm

Yeah it’s not implemented out of the box in the API and I’m not sure if it even can be. What I did to solve the issue is moving the audio recording and playback to the frontend i.e the browser. Before I had all the audio handling in my Python backend since I’m working on a local desktop application. Then you can use AEC with WebRTC for example, which most modern browsers support.

Topic		Replies	Views
RealtimeAPI audio feedback Feedback gpt-4	9	678	January 30, 2025
Realtime API starts to answer itself with mic+speaker setup API realtime	6	2400	November 22, 2024
Need help being able to interrupt the Realtime API response API realtime	19	5595	March 27, 2025
Background Noise Interfering with Realtime API Using Phone API realtime	12	2262	February 20, 2025
Realtime API re-consuming it's own output audio as input audio API audio , realtime , api-realtime , api-realtime-speech	10	936	January 10, 2025

Issue with realtime api user interruption

Related topics