Hi, I’m currently building a voice agent using the realtime API. I use the API on the frontend (so typescript), but fetch the ephemeral key via my backend. When the agent talks, I would like to create an audio reactive visual element. To do this, however, i need to get access to the agent’s audio output stream. Documentation doesn’t really mention a way to do this, and I’m kind of lost at this point. Is there even a way to do this?
Currently, I’m trying to gain access to the session’s peerConnection object to see if an audio stream is attached to this. However, when i log the session.transport object, i get weird behavior: a single time, the peerConnection showed up as connected, but countless other times it showed up as disconnected.
Here are the logs for reference:
Object { eventEmitter: {}, options: {},
#i: “gpt-realtime”,
#t: undefined,
#n: null,
#e: null,
#i: “link”,
#t: {…},
#n: false,
#e: false, … }
eventEmitter: Object { #t: EventTarget, #e: Map(15) }
options: Object { }
#i: “gpt-realtime”
#t: undefined
#n: null
#e: Object { type: “realtime”, object: “realtime.session”, id: “sess_CaJc6qwsojX6OXJCMyr2F”, … } #i: “link”
#t: Object { status: “disconnected”, peerConnection: undefined, dataChannel: undefined, … } callId: undefined
dataChannel: undefined
peerConnection: undefined
status: “disconnected”
: Object { … } #n: false #e: true #o: false : Object { … }
I understand that the peerConnection is not really meant to be accessed in this case, but is there really no other way to do this?