[Realtime API] Audio is randomly cutting off at the end

robertgr · January 14, 2025, 1:39pm

it clearly dependes on the answer size and chunks. some parts are flushed sometimes depending on how they fit. it’s incredible this is not fixed or clarified yet. but whatever. clearly it is not gpt4o issue as it works on OpenAI phone app. They are either using their own API integration or they do something which is not obvious to avoid that the last message chunks are flushed/dropped.

robertgr · January 14, 2025, 1:43pm

Maybe worth one more check too see if anything was done on this regards? Or now with WebRTC API the websocket “support” is gone?

jabolaso1 · January 14, 2025, 1:50pm

in what use case are you using this realtime api for audio?
its pretty expensive imo.

voelk · January 14, 2025, 2:06pm

Obviously it has limited deployment options until the price comes down, but I expect that is a trend that will continue. I am experimenting with better voice front ends for various applications we are have for individuals with disabilities.

kevinseanscalabrini · January 14, 2025, 2:25pm

for anyone who might need some help to alleviate this a little bit, my chunk size i send is 320 (40ms) of audio at 8ghz mulaw which is what comes from twilio which i am using. Mine works at a pretty good rate, but would love official guidance

kevinseanscalabrini · January 14, 2025, 2:25pm

its ~14-16 cents a minute for an agent which is less than a minimum wage employee. wouldnt deploy it to production in this state, but i’d say it is getting close.

swooby · January 14, 2025, 3:18pm

Their Android (and I assume iOS) client uses LiveKit (GitHub - livekit/client-sdk-android: LiveKit SDK for Android ; I found this by decompiling their apk with jadx), which they also have a partnership with (OpenAI and LiveKit partner to turn Advanced Voice into an API), so I am not really sure what their relationship or intentions are with each other.

Maybe you can learn from their code or jadx-ing the OpenAI apk to see if they are doing anything special to prevent this.

swooby · January 14, 2025, 3:27pm

I am writing my own personal voice assistant Android app (GitHub - swooby/AlfredAI), mostly for learning/kicks, and paying a few dollars a month to experiment with this seems reasonable to me.

I definitely would not scale this up to 100+ users using a single API key; it would be way too expensive right now.

My app is extremely rough draft proof-of-concept right now, so please don’t be too harsh.

I think one of my next tweaks will be to add an app setting where the user enters their own API key; then they can monitor their usage on their own OpenAI dev dashboard.

robertgr · January 16, 2025, 4:16pm

WELL WELL. If the problem is on server side, and OpenAI is not breathing regarding this one, we need to get very creative.

Try this at the very end of your “instructions”:

“It is important that you add a silent pause [pause] at the end of your response WITHOUT saying the word ‘PAUSE’.”

It seems to be working from some quick tests. Also if you try in chatgpt app, you will see it add this [pause] and on the audio you notice this pause indeed.

The idea of this shameful hack is that if something is dropped, it is mostly the silence at the end. Sometimes it was saying PAUSE lol, so had to make it clear it shouldn’t.

TO be clear, it is not working 100% of the time (you can see it in the transcription). You can probably reinforce the reason for this, and one could add a reminder in the user messages (“remember the pause”)… but at this point, one start to think who is paying for this extra-tokens.

keep me posted if you try this!

robertgr · January 17, 2025, 11:42am

it’s not long pause enough to prevent some part of the message to still be missing. was worth to try though.

nerginer · February 6, 2025, 5:39pm

We all waiting an explanation or a correction time plan for this issue from @openai engineers

nerginer · February 6, 2025, 6:34pm

Buffered playback. GPT-4o generates audio faster than it can be played back. Our SDKs automatically buffer, stream, handle user interruptions, and play back audio with the correct timing.

I got this from

Can this be a solution?

eric_r · February 6, 2025, 6:51pm

I’m using the LiveKit SDK and am still seeing the audio cut off issue there too

sashirestela · February 6, 2025, 9:10pm

This post could be relevant here:

tht.garuda · February 9, 2025, 5:44pm

I’ve noticed a potential correlation: when setting the voice to ‘verse,’ a large portion of the audio is consistently cut off. However, when leaving the voice setting at its default, the issue still occurs but much less frequently, affecting only the last part of the final sentence.

fwfutures · February 13, 2025, 9:13am

@stsuruno looks like they might be working on something that could be related, but perhaps that is wishful thinking! Wait until the output_audio_buffer is empty by stsuruno-openai · Pull Request #20 · openai/openai-realtime-agents · GitHub

matiaseacosta18 · February 17, 2025, 7:28pm

Same problem here, audio is cutting of at the end

deepakanto212 · February 18, 2025, 12:33pm

Guys i have a different problem with trying to figure out how to update a session

So i’m new to this and i’m building a application and trying to understand how openai’s realtime api with webrtc works.

here’s my frontend where i initiate the connection and create the session and get the ephemeral key.


          const tokenResponse = await fetch("/session");
          const tokenData = await tokenResponse.json();
          console.log("tokenData" + tokenData);
          const EPHEMERAL_KEY = tokenData.client_secret.value;
          console.log("Ephemeral key received:", EPHEMERAL_KEY);

and here’s the backend which will take that and send me my response.

# The /session endpoint
@app.route("/session", methods=["GET"])
def session_endpoint():
    openai_api_key = os.environ.get("OPENAI_API_KEY")
    if not openai_api_key:
        return jsonify({"error": "OPENAI_API_KEY not set"}), 500

    # Make a synchronous POST request to the OpenAI realtime sessions endpoint
    with httpx.Client() as client:
        r = client.post(
            "https://api.openai.com/v1/realtime/sessions",
            headers={
                "Authorization": f"Bearer {openai_api_key}",
                "Content-Type": "application/json",
            },
            json={
                "model": "gpt-4o-realtime-preview-2024-12-17",
                "voice": "verse",
                "instructions": "You are a English Tutor Ria"
            },
        )
        data = r.json()
        print(data)
        return jsonify(data)

and in my frontend i’m passing the sdp and creating the webrtc connection and getting the message response from the datachannel

          // 5. Set up a data channel for events.
          const dc = pc.createDataChannel("oai-events");

          dc.addEventListener("message", (e) => {
            console.log("Data Channel message:", e.data);
            try {


              // Parse the incoming string data to a JavaScript object
              const data = JSON.parse(e.data);


              // Check if the message type is "session.created"
              if (data.type === "session.created") {
                // Log the event ID
                console.log("Event ID:", data.event_id);
                
                // Send the event_id to your backend to update the session


                fetch('/update_session', {
                  method: 'POST',
                  headers: {
                    'Content-Type': 'application/json',
                  },
                  body: JSON.stringify({ event_id: data.event_id }),
                })
                .then(response => response.json())
                .then(data => console.log('Session updated:', data))
                .catch(error => console.error('Error updating session:', error));
              }
            } catch (error) {
              console.error("Error parsing JSON:", error);
            }
          });

and here’s the backend that’s supposed to update my session after i get my session is created i’m trying to hit the update_session endpoint and update my session but i don’t know why it’s not working .

@app.route("/update_session", methods=["POST"])
def update_session_endpoint():
    # Get the event_id from the request
    request_data = request.get_json()
    event_id = request_data.get("event_id")
    
    if not event_id:
        return jsonify({"error": "event_id is required"}), 400
    
    openai_api_key = os.environ.get("OPENAI_API_KEY")
    if not openai_api_key:
        return jsonify({"error": "OPENAI_API_KEY not set"}), 500
    
    # Make a synchronous POST request to the OpenAI realtime sessions endpoint
    with httpx.Client() as client:
        try:
            r = client.post(
                "https://api.openai.com/v1/realtime/sessions",
                headers={
                    "Authorization": f"Bearer {openai_api_key}",
                    "Content-Type": "application/json",
                },
                json={
                    "type": "session.update",
        "session": {
            "instructions": (
                "your a math tutor alex"
            ),
            "turn_detection": {
                "type": "server_vad",
                "threshold": 0.5,
                "prefix_padding_ms": 300,
                "silence_duration_ms": 500
            },
            "voice": "alloy",
            "temperature": 1,
            "max_response_output_tokens": 4096,
            "modalities": ["text", "audio"],
            "input_audio_format": "pcm16",
            "output_audio_format": "pcm16",
            "input_audio_transcription": {
                "model": "whisper-1"
            },
            "tool_choice": "auto",
            "tools": [
            ]
        }
                }
            )
            r.raise_for_status()  # Raise an exception for HTTP errors
            data = r.json()
            print("Session update response:", data)
            return jsonify({"success": True, "data": data})
        except httpx.HTTPStatusError as e:
            print(f"HTTP error occurred: {e}")
            return jsonify({"error": f"HTTP error: {e.response.status_code}", "details": e.response.text}), e.response.status_code
        except httpx.RequestError as e:
            print(f"Request error occurred: {e}")
            return jsonify({"error": f"Request error: {str(e)}"}), 500
        except Exception as e:
            print(f"Unexpected error: {e}")
            return jsonify({"error": f"Unexpected error: {str(e)}"}), 500

kathyh · March 17, 2025, 3:16pm

Glad you pointed this out. Looks like they’ve closed this as merged. I’ll be curious to hear any feedback from folks whether that change is working better. Still testing myself.

@stsuruno looks like they might be working on something that could be related, but perhaps that is wishful thinking! Wait until the output_audio_buffer is empty by stsuruno-openai · Pull Request #20 · openai/openai-realtime-agents · GitHub

bruce.kim.it · March 26, 2025, 7:37am

I have experienced this too and the frequency of reproduction is quite high on websocket while webrtc has not the issue so far.

My configuration for output audio:

output_audio_format: g711_ulaw

Does anyone know what progress is going on?

Topic		Replies	Views
Realtime API extremely expensive Feedback realtime	66	7755	December 4, 2024
[Realtime API] AI Answering Gibberish API realtime , api-realtime , api-realtime-speech	9	1062	October 25, 2024
4o and 4 API output has typo/missing words Bugs gpt-4	55	948	July 19, 2024
Streaming from Text-to-Speech api API api , python , tts	53	54573	January 21, 2025
Chatgpt api (openai-node v4.26.0) stream issue with gpt-4 models Bugs	18	1485	February 15, 2024

[Realtime API] Audio is randomly cutting off at the end

Related topics