[Realtime API] Audio is randomly cutting off at the end

it clearly dependes on the answer size and chunks. some parts are flushed sometimes depending on how they fit. it’s incredible this is not fixed or clarified yet. but whatever. clearly it is not gpt4o issue as it works on OpenAI phone app. They are either using their own API integration or they do something which is not obvious to avoid that the last message chunks are flushed/dropped.

Maybe worth one more check too see if anything was done on this regards? Or now with WebRTC API the websocket “support” is gone?

in what use case are you using this realtime api for audio?
its pretty expensive imo.

Obviously it has limited deployment options until the price comes down, but I expect that is a trend that will continue. I am experimenting with better voice front ends for various applications we are have for individuals with disabilities.

for anyone who might need some help to alleviate this a little bit, my chunk size i send is 320 (40ms) of audio at 8ghz mulaw which is what comes from twilio which i am using. Mine works at a pretty good rate, but would love official guidance

its ~14-16 cents a minute for an agent which is less than a minimum wage employee. wouldnt deploy it to production in this state, but i’d say it is getting close.

Their Android (and I assume iOS) client uses LiveKit (GitHub - livekit/client-sdk-android: LiveKit SDK for Android ; I found this by decompiling their apk with jadx), which they also have a partnership with (OpenAI and LiveKit partner to turn Advanced Voice into an API), so I am not really sure what their relationship or intentions are with each other.

Maybe you can learn from their code or jadx-ing the OpenAI apk to see if they are doing anything special to prevent this.

1 Like

I am writing my own personal voice assistant Android app (GitHub - swooby/AlfredAI), mostly for learning/kicks, and paying a few dollars a month to experiment with this seems reasonable to me.

I definitely would not scale this up to 100+ users using a single API key; it would be way too expensive right now.

My app is extremely rough draft proof-of-concept right now, so please don’t be too harsh.

I think one of my next tweaks will be to add an app setting where the user enters their own API key; then they can monitor their usage on their own OpenAI dev dashboard.

WELL WELL. If the problem is on server side, and OpenAI is not breathing regarding this one, we need to get very creative.

Try this at the very end of your “instructions”:

“It is important that you add a silent pause [pause] at the end of your response WITHOUT saying the word ‘PAUSE’.”

It seems to be working from some quick tests. Also if you try in chatgpt app, you will see it add this [pause] and on the audio you notice this pause indeed.

The idea of this shameful hack is that if something is dropped, it is mostly the silence at the end. Sometimes it was saying PAUSE lol, so had to make it clear it shouldn’t.

TO be clear, it is not working 100% of the time (you can see it in the transcription). You can probably reinforce the reason for this, and one could add a reminder in the user messages (“remember the pause”)… but at this point, one start to think who is paying for this extra-tokens.

keep me posted if you try this!

4 Likes

it’s not long pause enough to prevent some part of the message to still be missing. was worth to try though.

1 Like

We all waiting an explanation or a correction time plan for this issue from @openai engineers

  • Buffered playback. GPT-4o generates audio faster than it can be played back. Our SDKs automatically buffer, stream, handle user interruptions, and play back audio with the correct timing.

I got this from

Can this be a solution?

I’m using the LiveKit SDK and am still seeing the audio cut off issue there too :frowning:

This post could be relevant here:

1 Like

I’ve noticed a potential correlation: when setting the voice to ‘verse,’ a large portion of the audio is consistently cut off. However, when leaving the voice setting at its default, the issue still occurs but much less frequently, affecting only the last part of the final sentence.

@stsuruno looks like they might be working on something that could be related, but perhaps that is wishful thinking! Wait until the output_audio_buffer is empty by stsuruno-openai · Pull Request #20 · openai/openai-realtime-agents · GitHub

1 Like

Same problem here, audio is cutting of at the end

Guys i have a different problem with trying to figure out how to update a session

So i’m new to this and i’m building a application and trying to understand how openai’s realtime api with webrtc works.

here’s my frontend where i initiate the connection and create the session and get the ephemeral key.


          const tokenResponse = await fetch("/session");
          const tokenData = await tokenResponse.json();
          console.log("tokenData" + tokenData);
          const EPHEMERAL_KEY = tokenData.client_secret.value;
          console.log("Ephemeral key received:", EPHEMERAL_KEY);

and here’s the backend which will take that and send me my response.

# The /session endpoint
@app.route("/session", methods=["GET"])
def session_endpoint():
    openai_api_key = os.environ.get("OPENAI_API_KEY")
    if not openai_api_key:
        return jsonify({"error": "OPENAI_API_KEY not set"}), 500

    # Make a synchronous POST request to the OpenAI realtime sessions endpoint
    with httpx.Client() as client:
        r = client.post(
            "https://api.openai.com/v1/realtime/sessions",
            headers={
                "Authorization": f"Bearer {openai_api_key}",
                "Content-Type": "application/json",
            },
            json={
                "model": "gpt-4o-realtime-preview-2024-12-17",
                "voice": "verse",
                "instructions": "You are a English Tutor Ria"
            },
        )
        data = r.json()
        print(data)
        return jsonify(data)

and in my frontend i’m passing the sdp and creating the webrtc connection and getting the message response from the datachannel

          // 5. Set up a data channel for events.
          const dc = pc.createDataChannel("oai-events");

          dc.addEventListener("message", (e) => {
            console.log("Data Channel message:", e.data);
            try {


              // Parse the incoming string data to a JavaScript object
              const data = JSON.parse(e.data);


              // Check if the message type is "session.created"
              if (data.type === "session.created") {
                // Log the event ID
                console.log("Event ID:", data.event_id);
                
                // Send the event_id to your backend to update the session


                fetch('/update_session', {
                  method: 'POST',
                  headers: {
                    'Content-Type': 'application/json',
                  },
                  body: JSON.stringify({ event_id: data.event_id }),
                })
                .then(response => response.json())
                .then(data => console.log('Session updated:', data))
                .catch(error => console.error('Error updating session:', error));
              }
            } catch (error) {
              console.error("Error parsing JSON:", error);
            }
          });


and here’s the backend that’s supposed to update my session after i get my session is created i’m trying to hit the update_session endpoint and update my session but i don’t know why it’s not working .

@app.route("/update_session", methods=["POST"])
def update_session_endpoint():
    # Get the event_id from the request
    request_data = request.get_json()
    event_id = request_data.get("event_id")
    
    if not event_id:
        return jsonify({"error": "event_id is required"}), 400
    
    openai_api_key = os.environ.get("OPENAI_API_KEY")
    if not openai_api_key:
        return jsonify({"error": "OPENAI_API_KEY not set"}), 500
    
    # Make a synchronous POST request to the OpenAI realtime sessions endpoint
    with httpx.Client() as client:
        try:
            r = client.post(
                "https://api.openai.com/v1/realtime/sessions",
                headers={
                    "Authorization": f"Bearer {openai_api_key}",
                    "Content-Type": "application/json",
                },
                json={
                    "type": "session.update",
        "session": {
            "instructions": (
                "your a math tutor alex"
            ),
            "turn_detection": {
                "type": "server_vad",
                "threshold": 0.5,
                "prefix_padding_ms": 300,
                "silence_duration_ms": 500
            },
            "voice": "alloy",
            "temperature": 1,
            "max_response_output_tokens": 4096,
            "modalities": ["text", "audio"],
            "input_audio_format": "pcm16",
            "output_audio_format": "pcm16",
            "input_audio_transcription": {
                "model": "whisper-1"
            },
            "tool_choice": "auto",
            "tools": [
            ]
        }
                }
            )
            r.raise_for_status()  # Raise an exception for HTTP errors
            data = r.json()
            print("Session update response:", data)
            return jsonify({"success": True, "data": data})
        except httpx.HTTPStatusError as e:
            print(f"HTTP error occurred: {e}")
            return jsonify({"error": f"HTTP error: {e.response.status_code}", "details": e.response.text}), e.response.status_code
        except httpx.RequestError as e:
            print(f"Request error occurred: {e}")
            return jsonify({"error": f"Request error: {str(e)}"}), 500
        except Exception as e:
            print(f"Unexpected error: {e}")
            return jsonify({"error": f"Unexpected error: {str(e)}"}), 500    


Glad you pointed this out. Looks like they’ve closed this as merged. I’ll be curious to hear any feedback from folks whether that change is working better. Still testing myself.

@stsuruno looks like they might be working on something that could be related, but perhaps that is wishful thinking! Wait until the output_audio_buffer is empty by stsuruno-openai · Pull Request #20 · openai/openai-realtime-agents · GitHub

2 Likes

I have experienced this too and the frequency of reproduction is quite high on websocket while webrtc has not the issue so far.

My configuration for output audio:

  • output_audio_format: g711_ulaw

Does anyone know what progress is going on?