Hello everyone,
I’m currently working with the OpenAI Agents SDK, setting up a RealtimeAgent with WebRTC for voice interaction.
My setup works as expected with the primary RealtimeAgent: it respects my modalities configuration, and I consistently receive text output via response.text.delta and response.text.done events.
However, when switched to the handoff agent, the modalities setting is ignored and audio is always output.
Has anyone else encountered this specific issue where the handoff agent defaults to audio-only output and doesn’t emit text-related events, even when explicitly configured?
Thank you in advance for your help!
My setup is as follows:
const mainAgent = new RealtimeAgent({
name: "Main Agent",
});
const subAgent = new RealtimeAgent({
name: "Sub Agent",
});
mainAgent.handoffs = [subAgent]
session.current = new RealtimeSession(mainAgent, {
model: "gpt-4o-mini-realtime-preview",
config: {
modalities: ["text"],
inputAudioTranscription: {
model: "whisper-1",
language: "en",
},
turnDetection: {
type: "semantic_vad",
eagerness: "low",
create_response: false,
interrupt_response: false,
},
},
});
Events
currentSession.transport.on("response.text.delta", (event) => {
// Received only when primary agent
});
currentSession.transport.on("response.text.done", (event) => {
// Received only when primary agent
});