How does ElevenLabs or Deepgram realtime voice agents work as good as OpenAI Realtime API?

allanjsx · January 7, 2025, 5:21pm

Hi OpenAI community,

I’m recently comparing OpenAI’s voice realtime API, against other voice AI products’ realtime solutions, such as Deepgram’s Voice Agent, and ElevenLabs’ Conversational AI. To my suprise, they all work almost as good and instantaneous as OpenAI realtime api in terms of response speed.

Under the hood, these two products are using GPT 4o-mini or Claude 3.5 Haiku as their LLM, but somehow they are able to achive sub-second latency from end of speech to first byte voice response and it feels super natural. I think they are still doing the STT/TTS/LLM pattern under the hood, but somehow they can make the whole loop extremely fast.

When I implement my own TTS/STT/LLM test app, the time cost is like:

Detection of end of speech silence: 500-1000ms
Send finalized STT response to 4o-mini and having first chunk response streamed back: 800ms - 1s
TTS to generate first voice response: 200ms
So it looks impossible to reduce the latency to under 2s, but they were able to get a response instantaneous.

I don’t know how these companies achieve it. Would love some knowledge sharing, papers, open source demos that can help me understand the idea.

Sources:

Thanks!

integral · January 21, 2025, 5:30am

Hey I am very curious about this as well… If you get any insights please share

j.wischnat · January 21, 2025, 7:22am

While I don’t work at any of these companies, my guess is that they use TPUs/LPUs locally which are very fast and pretty much “made for AI”.
Having a local environment also removes the latency of requests.

Cheers.

zhyder · February 26, 2025, 8:08pm

Similarly interested in insights here on how they’re this fast despite the ‘old’ STT->LLM->TTS flow.

Also allanjsx, which provider(s) did you get 200ms with for TTS’s TTFB? I’m seeing 7x higher with OpenAI’s non-HD TTS, with occasionally much longer (i.e. high variance).

Topic		Replies	Views
How can chatgpt voice response so fast? API	5	4362	May 17, 2024
How to reduce latency with GPT & Unity Requests API gpt-4 , api	2	639	July 3, 2024
Eleven labs seem to be much faster than Open AI in text to speech (tts) Community api	6	4386	March 31, 2025
Anyone using OpenAI Realtime API with ElevenLabs voices? API realtime	9	2571	January 11, 2026
How does the 'Call Annie' app achieve such remarkable speed with the ChatGPT API, and is it using stream mode? API api-speed	8	4075	September 24, 2024

How does ElevenLabs or Deepgram realtime voice agents work as good as OpenAI Realtime API?

Related topics