Real-Time Model is hearing and talking to itself in a loop

Hi, I tried to setup a simple mobile app using react-native-webrtc and the code provided for connecting with the real-time api using WebRTC.

The only problem I’m facing is that the AI seems to be responding to itself.

I just say a “Hi”, and it seems to recursively greet me with “Hey”, “Hello there”,… It seems like it’s hearing it’s own audio and responding.

When I put the speakers on mute, and read my logs, I can see the AI only responds to my questions, and not to itself.

Here’s my code. Anyone know where I might be going wrong?

Client Side Code:

import { useState, useEffect } from "react";
import { View, TouchableOpacity, Text, Alert } from "react-native";
import { Audio } from "expo-av";
import {
  RTCPeerConnection,
  mediaDevices,
} from "react-native-webrtc";

export default function TalkToAIScreen() {
  const [isRecording, setIsRecording] = useState(false);
  const [audioPermission, setAudioPermission] = useState(false);
  const [recordingStatus, setRecordingStatus] = useState("Idle");
  const [ephemeralToken, setEphemeralToken] = useState(null);
  const [connectionStatus, setConnectionStatus] = useState(null);

  async function getEphemeralToken() {
    // Get an ephemeral token from the server
    try {
      const tokenResponse = await fetch(
        "http://localhost:3000/session"
      );
      const tokenData = await tokenResponse.json();
      console.log("Token:", tokenData.client_secret.value);
      setEphemeralToken(tokenData.client_secret.value);
      return tokenData.client_secret.value;
    } catch (error) {
      Alert.alert("Error", "Couldn't retrieve ephemeral token");
      return null;
    }
  }
  async function setupWebRTC(EPHEMERAL_KEY) {
    try {
      // Create a peer connection
      const pc = new RTCPeerConnection();

      pc.ontrack = (event) => {
        console.log("Received audio track from OpenAI");
      };

      // Add local audio track for microphone input in the browser
      // Add local audio track for microphone input in the browser
      try {
        const ms = await mediaDevices.getUserMedia({
          audio: true,
        });
        pc.addTrack(ms.getTracks()[0]);
      } catch (audioError) {
        console.error("Audio setup failed:", audioError);
        throw new Error(`Audio setup failed: ${audioError.message}`);
      }

      // Setup data channel for sending and receiving events
      // Set up data channel for sending and receiving events
      const dc = pc.createDataChannel("oai-events");
      dc.addEventListener("message", (e) => {
        try {
          // Realtime server events appear here!
          const message = JSON.parse(e.data);
          console.log(
            "Got a data channel message:",
            JSON.stringify(message)
          );
        } catch (error) {
          console.error("Failed to parse data channel message:", err);
        }
      });

      // Start the session using the Session Description Protocol (SDP)
      console.log("Creating offer...");
      const offer = await pc.createOffer();
      console.log(
        "Offer created:",
        offer.sdp.substring(0, 100) + "..."
      );

      await pc.setLocalDescription(offer);
      console.log("Local description set successfully");

      console.log("Sending offer to OpenAI...");
      const baseUrl = "https://api.openai.com/v1/realtime";
      const model = "gpt-4o-realtime-preview-2024-12-17";
      const sdpResponse = await fetch(`${baseUrl}?model=${model}`, {
        method: "POST",
        body: offer.sdp,
        headers: {
          Authorization: `Bearer ${EPHEMERAL_KEY}`,
          "Content-Type": "application/sdp",
        },
      });

      if (!sdpResponse.ok) {
        const errorText = await sdpResponse.text();
        console.error("OpenAI Response Error:", {
          status: sdpResponse.status,
          statusText: sdpResponse.statusText,
          body: errorText,
        });
        throw new Error(
          `Failed to send offer to OpenAI: ${sdpResponse.status} - ${errorText}`
        );
      }

      const answer = {
        type: "answer",
        sdp: await sdpResponse.text(),
      };

      await pc.setRemoteDescription(answer);
      console.log("Remote description set successfully");

      console.log("Connection setup complete!");
      setConnectionStatus("connected");
    } catch (error) {
      console.error("Failed to setup connection:", error);
      setConnectionStatus("error");
      Alert.alert(
        "Connection Error",
        "Failed to connect to AI service"
      );
      return false;
    }
  }

  async function handleTalkToAI() {
    const EPHEMERAL_KEY = await getEphemeralToken();
    setupWebRTC(EPHEMERAL_KEY);
  }


  return (
    <View
      style={{
        flex: 1,
        alignItems: "center",
        justifyContent: "center",
        paddingHorizontal: 20,
      }}
    >
      {/* CREATED: The main button that controls recording */}
      <TouchableOpacity
        onPress={handleTalkToAI}
        disabled={!audioPermission}
        style={{
          backgroundColor: isRecording ? "#D9534F" : "#0275D8",
          borderRadius: 8,
          padding: 16,
          marginBottom: 16,
          opacity: audioPermission ? 1 : 0.5,
        }}
      >
        <Text style={{ color: "#fff", fontSize: 16 }}>
          {isRecording ? "Stop Talking" : "Talk to AI"}
        </Text>
      </TouchableOpacity>

      {/* CREATED: Shows what's currently happening */}
      <Text style={{ marginTop: 20, color: "#666" }}>
        Status: {recordingStatus}
      </Text>
      <Text style={{ marginTop: 20, color: "#666" }}>
        Token: {ephemeralToken}
      </Text>
      <Text style={{ marginTop: 10, color: "#666" }}>
        Connection: {connectionStatus}
      </Text>
      <Text style={{ marginTop: 10, color: "#666" }}>
        Audio Status: {sound ? "Playing" : "Silent"}
      </Text>
    </View>
  );
}

Server Code

const express = require("express");
const dotenv = require("dotenv");

const cors = require("cors");

dotenv.config();

const app = express();

// An endpoint which would work with the client code above - it returns
// the contents of a REST API request to this protected endpoint
app.get("/session", async (req, res) => {
  console.log(
    "Using API Key:",
    process.env.OPENAI_API_KEY ? "Key exists" : "No key found"
  );
  const r = await fetch(
    "https://api.openai.com/v1/realtime/sessions",
    {
      method: "POST",
      headers: {
        Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        model: "gpt-4o-realtime-preview-2024-12-17",
        voice: "verse",
      }),
    }
  );
  const data = await r.json();
  console.log("OpenAI Response For Ephemeral Token:", data);

  // Send back the JSON we received from the OpenAI REST API
  res.send(data);
});

app.use(cors());

app.listen(3000, () => {
  console.log("Server is running on port 3000");
});
1 Like

Don’t see anything weird. Could be your device

Ya, it’s really weird. Tried it on web and it works. Will respond once I figure it out. Thanks!

1 Like

Good luck! As always in programming, if you don’t give up, you do eventually figure it out!

2 Likes

Hello, currently facing same issue in a react-native-webrtc component running on ios simulator. Works with headphones, else it consumes own audio response as new input. Have you figured it out? Cheers!

I switched the app to Web for now. But I think the problem should be solved with web socket implementation. Many have posted mobile apps using the original web socket implementation. So my strong guess is that web sockets work out of the box on iOS.

@choprarishabh1010199 - Did you make it work in App? or found a fix except for moving to web? - I I saw a similar issue on App when running in Android simulation on a low powered laptop.

I might be suffering from the same issue, though i am using the Websocket version. At first i thought it was an infinite loop, but it was a very long but finite series of audio deltas which in content was basically repeating itself.
Now that I saw this post, it does sound like the API was conversing to it’s last response.
How often did this happen to you? For me it was quite rare, and I wasn’t able to determine the cause (from the logs nothing seems different other than the fact that it’s just sending a very long series of response.audio.delta)

I’m facing the exact same issue! It’s super annoying and makes it extremely unusable. I don’t think the issue is that it’s listening to itself, rather it just enters into an infinite loop.

feedback ChatGPT > Bugs ChatGPT > Support

It’s happening in the iPhone app. Trying deep research to try and understand what s happening this is the response “ Voice Echo Issues on ChatGPT iOS (iPhone 14 Pro, iOS 18.4)

Voice echo problems in the ChatGPT iOS app (using Standard or Advanced Voice) can happen due to a few known issues. Below we outline possible causes and known bugs, followed by settings adjustments and troubleshooting steps to resolve the echo.

Possible Causes and Known Bugs
• Audio Feedback / Echo Cancellation Bug: One known issue is the app picking up its own voice output through the microphone. This causes ChatGPT to “respond to itself,” creating an echo or loop of repeated responses . In other words, if the phone’s speaker is playing the AI’s voice, the mic might hear it and the assistant thinks it’s user input. This is an intermittent bug – reports note it doesn’t happen every time (roughly 25% of the time, often at the start of a conversation) . This could explain why it happened initially, stopped, then recurred in a new thread.
• Advanced Voice Mimicking the User’s Voice: In rare cases, Advanced Voice Mode has exhibited a bug where ChatGPT’s voice output unintentionally clones or mimics the user’s own voice. OpenAI’s testing found that “a noisy input triggered the AI to mimic the voice of the user” under certain conditions . If this occurs, it can sound like an “echo” of your voice – the assistant suddenly speaks in your tone. OpenAI has added safeguards to minimize this, but it remains a documented (though uncommon) bug .
• Speakerphone Use and Background Noise: Using the app on speakerphone or in loud environments can exacerbate echo issues. The Advanced Voice feature is “not yet optimized for … speakerphone” use . If the phone’s mic picks up any background sounds or the assistant’s own voice (for example, if the volume is high or you’re in a car), it may cause interruptions or the bot to think you’re talking when you’re not. Standard Voice mode also can mis-hear background audio as input, since it’s “very bad at ignoring background voices” according to user reports, making it start/stop in error  .

Troubleshooting Steps and Solutions

Fortunately, several settings adjustments and fixes can help resolve or prevent the echo issue:
• Use Headphones: The simplest fix is to use headphones or earbuds during voice conversations. This prevents the phone’s speaker output from feeding back into the mic. OpenAI explicitly “recommends using voice mode with headphones” to avoid unintended interruptions . Many users have found the echo disappears when not using the loudspeaker.
• Enable Voice Isolation (iPhone Mic Mode): On iPhone, turn on the built-in Voice Isolation feature for the microphone. This mode filters out background noises and should stop the mic from picking up the assistant’s voice from the speaker. To enable it, open the Control Center while you’re in a voice conversation, tap the Mic Mode button (top-right of the audio card), and select Voice Isolation . This iOS feature can significantly reduce echo and “help avoid unintentional interruptions” during ChatGPT voice chats .
• Keep Environment Quiet & Adjust Volume: Try to use ChatGPT in a quiet environment to avoid confusing the speech recognition. If possible, lower the volume of the assistant’s voice (or keep the phone a bit farther from the mic) when not using headphones – this reduces the chance of the mic hearing the speaker. Conversely, make sure you can hear the assistant clearly (you might increase volume slightly or bring the phone closer when listening) so you don’t accidentally talk over it. OpenAI suggests that having the assistant’s voice at an audible level in a quiet setting can help the system distinguish your voice from any echo . (Essentially, you want to minimize crosstalk between the speaker and mic.)
• Restart the App (or Phone): If echo issues start, try closing the ChatGPT app completely and reopening it. This can reset the audio session. In many cases, this alone clears up the glitch . Since you already tried restarting the phone (which also resets hardware audio pathways), make sure after reboot to launch the app fresh and start a new voice chat. Beginning a new chat session after a restart can sometimes eliminate any lingering audio bug from the previous session.
• Reinstall or Update the App: You mentioned the app was recently updated, which is good. Ensure you’re on the latest version available, because OpenAI continually fixes bugs in updates. If the app is up to date but still echoing, consider reinstalling it. A reinstall will clear the app’s cache and settings which might be causing the issue. (Before reinstalling, you could also try logging out and back in, in case it’s account/session related.) Reinstalling gives you a clean state in case something corrupted during the update.
• Avoid Incompatible Audio Setups: Until this bug is fully resolved, avoid using Advanced Voice in scenarios known to be problematic. For example, in-car Bluetooth systems or speakerphone mode are not recommended (Advanced Voice “is not yet optimized” for those outputs ). If you need to use it hands-free in a car, use a single-ear headset or ensure the car’s system isn’t feeding back audio. Similarly, if you noticed echo in Standard Voice, stick to a normal phone handset mode or headphones instead of speakerphone. These tweaks ensure the microphone only hears your voice.
• Check for Known Issues: It’s a good idea to keep an eye on official support channels or community forums for any newly discovered bugs similar to yours. OpenAI’s help center and forums have posts about voice quality issues. For instance, the echo/self-talk bug is known and has been discussed by users and OpenAI (with the team working on echo cancellation improvements ). If a particular version of the app has a bug, often an update or patch notes will mention a fix. Staying updated on those can validate that it’s not a hardware problem on your end.

If after trying the above the echo still persists, it may indeed be an unresolved bug. In that case, consider reporting the issue through the OpenAI app’s feedback (or their support email/forums) so they’re aware it’s happening on your device. Given that voice features are relatively new, OpenAI may still be ironing out these kinks, and user reports help them fix such problems in updates.

Sources:
• OpenAI Help Center – Advanced Voice Mode FAQ (tips on avoiding interruptions with headphones and Voice Isolation)  
• OpenAI Developer Forum – user report on echo cancellation bug causing the assistant to hear itself (voice loop issue) 
• TechRadar – report on Advanced Voice Mode occasionally mimicking the user’s voice due to a known bug in GPT-4’s voice model 

ChatGPT > Bugs
Fix for Echo Bug in OpenAI App (Audio Feedback Issue)

Hi everyone! I wanted to share a quick fix that worked for me when dealing with the echo bug in the OpenAI app.

Problem: The app was picking up its own audio, creating an echo or feedback loop where it would repeat the last sentence as if it was my input. This usually happened when using the voice feature.

Solution: I enabled the real-time captions feature on my device. This way, the app was able to separate the audio output from my own voice input, effectively stopping the echo. It worked consistently for me, and I hope it helps anyone else experiencing the same issue.
Another way is using the earphones, it also does the trick

Let me know if this solution works for you too or if you’ve found other fixes!