Feature request — Custom voices in Realtime API (sub-second)

We’re building a phone-based AI agent with Twilio Media Streams. Today we must add external TTS (e.g., ElevenLabs) to get a brand voice, which adds ~300–600 ms latency.
Please allow custom TTS voices inside Realtime (or a low-latency voice plug-in) so we keep barge-in + <1 s experience without an extra hop.
Requirements: voice selection per session, SSML prosody, streaming partials, EU region support.
Happy to beta and share metrics (QoE, MOS, interruption timing).