Hi everyone, I’m building a real-time voice AI implementation using WebSockets. While it works smoothly on most devices, I’m running into a “self-talking” loop specifically on Samsung devices.
Even with a Mic Gate and audioSource: 7, the AI seems to “hear” its own output, triggering speech_started and creating a feedback loop. It’s as if the hardware AEC isn’t engaging or the system is routing the speaker output back into the microphone buffer before the gate can close.
Has anyone dealt with Samsung-specific AEC issues in React Native/Node environments? Here is the core logic I’m using for the Mic Gate and stream initialization:
JavaScript
// — 1. THE MIC GATE —
LiveAudioStream.on(‘data’, (data) => {
// Only send microphone chunks if the AI is NOT currently speaking
if (
isRecordingRef.current && !isPlayingRef.current && // The Gate: Ref is true during playback wsRef.current?.readyState === WebSocket.OPEN) {
wsRef.current.send(JSON.stringify({ type: 'input_audio_buffer.append', audio: data }));}
});
// — 2. HARDWARE AEC INITIALIZATION —
const options = {
sampleRate: 24000,
channels: 1,
bitsPerSample: 16,
audioSource: 7, // CRITICAL: This enables the phone’s native echo canceller
wavFile: false
};
LiveAudioStream.init(options);
// — 3. THE INTERRUPTION HANDLER —
handleWebSocketMessage = (event) => {
const data = JSON.parse(event.data);
switch (data.type) {
case 'input_audio_buffer.speech_started': // If user talks while AI is talking, stop the AI immediately if (isPlayingRef.current) { DirectAudioPlayer.stop(); isPlayingRef.current = false; // Clear the server-side buffer so the AI doesn't "hear" // the echo that happened right before the stop wsRef.current.send(JSON.stringify({ type: 'input_audio_buffer.clear' })); wsRef.current.send(JSON.stringify({ type: 'response.cancel' })); } break; case 'response.audio.delta': isPlayingRef.current = true; // Close the Mic Gate DirectAudioPlayer.playDelta(data.delta); break; case 'response.audio.done': isPlayingRef.current = false; // Open the Mic Gate break;}
};
Specific symptoms:
The AI starts speaking.
The input_audio_buffer.speech_started event fires almost immediately (triggered by the AI’s own voice).
The AI interrupts itself, clears the buffer, and enters a loop.