Get audio output of voice agent using realtime api / agents sdk

Hi, I’m currently building a voice agent using the realtime API. I use the API on the frontend (so typescript), but fetch the ephemeral key via my backend. When the agent talks, I would like to create an audio reactive visual element. To do this, however, i need to get access to the agent’s audio output stream. Documentation doesn’t really mention a way to do this, and I’m kind of lost at this point. Is there even a way to do this?

Currently, I’m trying to gain access to the session’s peerConnection object to see if an audio stream is attached to this. However, when i log the session.transport object, i get weird behavior: a single time, the peerConnection showed up as connected, but countless other times it showed up as disconnected.
Here are the logs for reference:

Object { eventEmitter: {}, options: {},
#i: “gpt-realtime”,
#t: undefined,
#n: null,
#e: null,
#i: “link”,
#t: {…},
#n: false,
#e: false, … }
eventEmitter: Object { #t: EventTarget, #e: Map(15) }
options: Object { }
#i: “gpt-realtime”
#t: undefined
#n: null
#e: Object { type: “realtime”, object: “realtime.session”, id: “sess_CaJc6qwsojX6OXJCMyr2F”, … } #i: “link” ​
#t: Object { status: “disconnected”, peerConnection: undefined, dataChannel: undefined, … } callId: undefined ​​
dataChannel: undefined
peerConnection: undefined
status: “disconnected”
: Object { … } #n: false #e: true #o: false : Object { … }

I understand that the peerConnection is not really meant to be accessed in this case, but is there really no other way to do this?

1 Like

if anyone stumbles across this in the future, i was just blind, the solution can be found in the documentation:

Connecting over WebRTC

The default transport layer uses WebRTC. Audio is recorded from the microphone and played back automatically.

To use your own media stream or audio element, provide an OpenAIRealtimeWebRTC instance when creating the session.

import { RealtimeAgent, RealtimeSession, OpenAIRealtimeWebRTC } from '@openai/agents/realtime';

const agent = new RealtimeAgent({
   name: 'Greeter',
   instructions: 'Greet the user with cheer and answer questions.',
});

async function main() {
   const transport = new OpenAIRealtimeWebRTC({
      mediaStream: await navigator.mediaDevices.getUserMedia({ audio: true }),
      audioElement: document.createElement('audio'),
});

const customSession = new RealtimeSession(agent, { transport });
}
3 Likes