Two realtime voice agent communication pattern

olav1 · August 18, 2025, 1:01pm

Hello all, I’m trying to make use of the real-time voice API, and specifically for my demonstration, I want two voice agents to interact with each other in two different roles.

I have tried various things, but I’m unable to create a pattern where I can make use of VAD and create a natural conversation between two AI agents. Either the second agent never responds, or they start speaking halfway through the first agent.

Has anyone a code example where this works?

Thank you!

_j · August 18, 2025, 1:34pm

Sounds like you are doing the code equivalent of holding two phones together.

Server VAD is going to be poor. It will be much better to trigger a create after the buffer is loaded and the first output is response.done.

Better would just be to make Chat Completions calls with audio messages and modality to an audio-preview model. You can then start with text-to-text and see the conversation degrade to loops at lower expense.

kathyh · October 3, 2025, 9:12pm

Curious where this went for you. I agree this is a doozy of a challenge. I’d try something like pipecat’s smart turn base_smart_turn — pipecat-ai documentation which uses ML for turn detection. They have a wraparound for the realtime model OpenAI Realtime - Pipecat . But it might take a more robust setup like webRTC, and coming out of the same speaker and client audio I would imagine there’s no quick fix. Maybe simulate multiple clients somehow. IDK, might give this a try too

mcfinley · October 3, 2025, 9:48pm

I’ve done this before but with two devices. Its super fun, especially if you mix models and give them each a different personality. With a single device, the hard part is getting the audio streams to not stomp on each other, but on two devices, its easy. I use raspberry pi devices so its super cheap but hardware is not an option for most people.

I’m guessing you could get it to work well on a PC or mac if you can separate the audio streams in hardware. Attach two bluetooth speakerphones (so the agent does not hear itself talk), and assign one to each agent.

drewanderson · October 3, 2025, 11:54pm

Why not pass the transcripts back forth instead of voice?

Topic		Replies	Views
How to Manage a Multi-Agent Conversation with Different Voices Using Realtime API API api-realtime	4	755	July 21, 2025
Chained approach vs gpt-4o-audio-preview API voice , gpt-4o-audio-preview	2	555	April 30, 2025
Multi agents for HR interview bot with gpt-realtime API realtime , api-realtime , api-realtime-speech	1	242	November 3, 2025
Two way real time conversations in Real Time API API api	1	146	October 4, 2025
RealTime API echo issue when on speaker phone API api-realtime	5	358	January 6, 2026

Two realtime voice agent communication pattern

Related topics