Ability to Specify Speaker Name or Source in Real-Time API for Group Sessions

multitechvisions · September 21, 2025, 2:47am

Hi OpenAI Team,

I’d like to request a feature for the real-time API to better support group conversations—specifically, the ability to specify a speaker name (or some kind of speaker identifier) when submitting audio into the buffer.

Use Case:
In multi-user environments (such as live meetings, classroom discussions, or collaborative workshops), it’s common to have multiple human speakers engaging with the assistant. Right now, there’s no built-in way to indicate to the API which person is speaking at any given time. This creates challenges for both the assistant’s understanding and the quality of any generated transcript.

Potential Solutions:

Allow us to attach a speaker identifier with each incoming audio buffer.
Enable tagging of audio streams or provide support for multiple parallel audio streams (where each stream is mapped to a known participant/microphone).
Accept metadata (like speaker: "Alice") alongside streaming audio, so the assistant can correctly attribute each turn of dialog.

Why this matters:
Distinguishing between speakers would make conversations much more natural and accurate. This is important for applications like collaborative group chats, educational settings, telehealth, and anywhere more than one human is interfacing with the API at once.

A concrete example: In a two-mic setup, each mic is assigned to a specific participant. If I could specify speaker="Alice" for mic 1 and speaker="Bob" for mic 2 when submitting audio, the conversation context would be vastly improved.

Current Workarounds:
I’ve considered workarounds like running diarization externally and prepending transcripts with names, but this adds latency and complexity (especially in real-time scenarios). Native support would be much more accurate and developer-friendly.

Thanks for considering this! Would love to know if this is on the roadmap, or if there’s a recommended workaround I’ve missed.

Topic		Replies	Views
Training on transcribed audio output - How do I make the AI know who said what? Prompting	2	494	May 20, 2022
Useful feature: Allow sending "name" for unseen ChatML assistant prompt & in fine tuning Feedback chat-completion	3	594	July 10, 2024
[Closed] Suggestion: Username field for chat completion API chatgpt , api , functions	2	1674	July 16, 2023
Audio File Trtanscription API	2	593	May 29, 2024
Realtime API needs toggle for AGC to not break experience Feedback api , realtime , api-realtime , api-realtime-speech	0	111	October 18, 2024

Ability to Specify Speaker Name or Source in Real-Time API for Group Sessions

Related topics