it clearly dependes on the answer size and chunks. some parts are flushed sometimes depending on how they fit. it’s incredible this is not fixed or clarified yet. but whatever. clearly it is not gpt4o issue as it works on OpenAI phone app. They are either using their own API integration or they do something which is not obvious to avoid that the last message chunks are flushed/dropped.
Maybe worth one more check too see if anything was done on this regards? Or now with WebRTC API the websocket “support” is gone?
in what use case are you using this realtime api for audio?
its pretty expensive imo.
Obviously it has limited deployment options until the price comes down, but I expect that is a trend that will continue. I am experimenting with better voice front ends for various applications we are have for individuals with disabilities.
for anyone who might need some help to alleviate this a little bit, my chunk size i send is 320 (40ms) of audio at 8ghz mulaw which is what comes from twilio which i am using. Mine works at a pretty good rate, but would love official guidance
its ~14-16 cents a minute for an agent which is less than a minimum wage employee. wouldnt deploy it to production in this state, but i’d say it is getting close.
Their Android (and I assume iOS) client uses LiveKit (GitHub - livekit/client-sdk-android: LiveKit SDK for Android ; I found this by decompiling their apk with jadx), which they also have a partnership with (OpenAI and LiveKit partner to turn Advanced Voice into an API), so I am not really sure what their relationship or intentions are with each other.
Maybe you can learn from their code or jadx-ing the OpenAI apk to see if they are doing anything special to prevent this.
I am writing my own personal voice assistant Android app (GitHub - swooby/AlfredAI), mostly for learning/kicks, and paying a few dollars a month to experiment with this seems reasonable to me.
I definitely would not scale this up to 100+ users using a single API key; it would be way too expensive right now.
My app is extremely rough draft proof-of-concept right now, so please don’t be too harsh.
I think one of my next tweaks will be to add an app setting where the user enters their own API key; then they can monitor their usage on their own OpenAI dev dashboard.
WELL WELL. If the problem is on server side, and OpenAI is not breathing regarding this one, we need to get very creative.
Try this at the very end of your “instructions”:
“It is important that you add a silent pause [pause] at the end of your response WITHOUT saying the word ‘PAUSE’.”
It seems to be working from some quick tests. Also if you try in chatgpt app, you will see it add this [pause] and on the audio you notice this pause indeed.
The idea of this shameful hack is that if something is dropped, it is mostly the silence at the end. Sometimes it was saying PAUSE lol, so had to make it clear it shouldn’t.
TO be clear, it is not working 100% of the time (you can see it in the transcription). You can probably reinforce the reason for this, and one could add a reminder in the user messages (“remember the pause”)… but at this point, one start to think who is paying for this extra-tokens.
keep me posted if you try this!
it’s not long pause enough to prevent some part of the message to still be missing. was worth to try though.
We all waiting an explanation or a correction time plan for this issue from @openai engineers
- Buffered playback. GPT-4o generates audio faster than it can be played back. Our SDKs automatically buffer, stream, handle user interruptions, and play back audio with the correct timing.
I got this from
Can this be a solution?
I’m using the LiveKit SDK and am still seeing the audio cut off issue there too
This post could be relevant here:
I’ve noticed a potential correlation: when setting the voice to ‘verse,’ a large portion of the audio is consistently cut off. However, when leaving the voice setting at its default, the issue still occurs but much less frequently, affecting only the last part of the final sentence.
@stsuruno looks like they might be working on something that could be related, but perhaps that is wishful thinking! Wait until the output_audio_buffer is empty by stsuruno-openai · Pull Request #20 · openai/openai-realtime-agents · GitHub
Same problem here, audio is cutting of at the end
Guys i have a different problem with trying to figure out how to update a session
So i’m new to this and i’m building a application and trying to understand how openai’s realtime api with webrtc works.
here’s my frontend where i initiate the connection and create the session and get the ephemeral key.
const tokenResponse = await fetch("/session");
const tokenData = await tokenResponse.json();
console.log("tokenData" + tokenData);
const EPHEMERAL_KEY = tokenData.client_secret.value;
console.log("Ephemeral key received:", EPHEMERAL_KEY);
and here’s the backend which will take that and send me my response.
# The /session endpoint
@app.route("/session", methods=["GET"])
def session_endpoint():
openai_api_key = os.environ.get("OPENAI_API_KEY")
if not openai_api_key:
return jsonify({"error": "OPENAI_API_KEY not set"}), 500
# Make a synchronous POST request to the OpenAI realtime sessions endpoint
with httpx.Client() as client:
r = client.post(
"https://api.openai.com/v1/realtime/sessions",
headers={
"Authorization": f"Bearer {openai_api_key}",
"Content-Type": "application/json",
},
json={
"model": "gpt-4o-realtime-preview-2024-12-17",
"voice": "verse",
"instructions": "You are a English Tutor Ria"
},
)
data = r.json()
print(data)
return jsonify(data)
and in my frontend i’m passing the sdp and creating the webrtc connection and getting the message response from the datachannel
// 5. Set up a data channel for events.
const dc = pc.createDataChannel("oai-events");
dc.addEventListener("message", (e) => {
console.log("Data Channel message:", e.data);
try {
// Parse the incoming string data to a JavaScript object
const data = JSON.parse(e.data);
// Check if the message type is "session.created"
if (data.type === "session.created") {
// Log the event ID
console.log("Event ID:", data.event_id);
// Send the event_id to your backend to update the session
fetch('/update_session', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({ event_id: data.event_id }),
})
.then(response => response.json())
.then(data => console.log('Session updated:', data))
.catch(error => console.error('Error updating session:', error));
}
} catch (error) {
console.error("Error parsing JSON:", error);
}
});
and here’s the backend that’s supposed to update my session after i get my session is created i’m trying to hit the update_session endpoint and update my session but i don’t know why it’s not working .
@app.route("/update_session", methods=["POST"])
def update_session_endpoint():
# Get the event_id from the request
request_data = request.get_json()
event_id = request_data.get("event_id")
if not event_id:
return jsonify({"error": "event_id is required"}), 400
openai_api_key = os.environ.get("OPENAI_API_KEY")
if not openai_api_key:
return jsonify({"error": "OPENAI_API_KEY not set"}), 500
# Make a synchronous POST request to the OpenAI realtime sessions endpoint
with httpx.Client() as client:
try:
r = client.post(
"https://api.openai.com/v1/realtime/sessions",
headers={
"Authorization": f"Bearer {openai_api_key}",
"Content-Type": "application/json",
},
json={
"type": "session.update",
"session": {
"instructions": (
"your a math tutor alex"
),
"turn_detection": {
"type": "server_vad",
"threshold": 0.5,
"prefix_padding_ms": 300,
"silence_duration_ms": 500
},
"voice": "alloy",
"temperature": 1,
"max_response_output_tokens": 4096,
"modalities": ["text", "audio"],
"input_audio_format": "pcm16",
"output_audio_format": "pcm16",
"input_audio_transcription": {
"model": "whisper-1"
},
"tool_choice": "auto",
"tools": [
]
}
}
)
r.raise_for_status() # Raise an exception for HTTP errors
data = r.json()
print("Session update response:", data)
return jsonify({"success": True, "data": data})
except httpx.HTTPStatusError as e:
print(f"HTTP error occurred: {e}")
return jsonify({"error": f"HTTP error: {e.response.status_code}", "details": e.response.text}), e.response.status_code
except httpx.RequestError as e:
print(f"Request error occurred: {e}")
return jsonify({"error": f"Request error: {str(e)}"}), 500
except Exception as e:
print(f"Unexpected error: {e}")
return jsonify({"error": f"Unexpected error: {str(e)}"}), 500
Glad you pointed this out. Looks like they’ve closed this as merged. I’ll be curious to hear any feedback from folks whether that change is working better. Still testing myself.
@stsuruno looks like they might be working on something that could be related, but perhaps that is wishful thinking! Wait until the output_audio_buffer is empty by stsuruno-openai · Pull Request #20 · openai/openai-realtime-agents · GitHub
I have experienced this too and the frequency of reproduction is quite high on websocket while webrtc has not the issue so far.
My configuration for output audio:
- output_audio_format: g711_ulaw
Does anyone know what progress is going on?