Realtime Audio API not emitting audio.delta

I’m connecting to the Realtime API using WebRTC. I’m trying to capture the raw pcm audio emitted from the remote track. Theoretically those chunks are available in the audio.delta server event, but I’m not seeing those when I log out all events emitted from Realtime API. Are others getting these? Or perhaps getting the audio another way?

3 Likes

any updates? I recently ran into the realization that i don’t get the audio.delta when implementing it through webrtc on android. This is crucial for me since it seems like it’s the only way to actually get the encode audio data from the model.

Unfortunately I’ve pivoted away from using WebRTC as a result of this!

Running into the same issue. Its critical. OpenAI guys please respond with a fix or resolution. I already spent over two days on this buggy api.

1 Like

So i reached out to support and they gave me a long reply but this is what stood out to me:

For WebRTC connections, audio output from the model is delivered as a remote media stream. Ensure your client-side application is set up to play this stream correctly. Without more specific details, these are just general suggestions.

which honestly makes sense, i was under the impression that i needed to do some decoding of the deltas but the webRTC already does this for you since it’s sending it down the mediaStream of your client so that the audio can be played.

Which is the exact reason why we dont get deltas when using realtime through webRTC! I’m new to webRTC so this is news to me. I should just be able to include the mediaStream and get audio right away. I’ll return if this is in fact the case.

Here you have an OpenAI WebRTC demo handling the audio streaming. The web part is just a Javascript handling all the WebRTC interaction including audio. There is also a small Java backend just to provide ephemeral tokens to the frontend to connect to OpenAI Realtime.

1 Like