Eleven labs seem to be much faster than Open AI in text to speech (tts)

anteatereater · December 12, 2024, 6:06am

I ran some testing to compare the tts latency of eleven labs and open ai to process relative short text. Eleven labs seems to be about four times faster than open ai.

Three iterations of the test cases:

short: 10 words
medium: 24 words
long: 64 words

TTS Latency Test Results for open ai:

Number of tests: 9
Average generation time: 9.70 seconds
Average audio duration: 10.51 seconds
Average processing speed: 5.28 words/second
Average speaking speed: 3.36 words/second

TTS Latency Test Results for elevenlabs:

Number of tests: 9
Average generation time: 2.38 seconds
Average audio duration: 9.84 seconds
Average processing speed: 13.17 words/second
Average speaking speed: 3.44 words/second

ilianos1 · December 17, 2024, 12:38am

The title says “speech to text” but I guess you meant “text to speech”
I don’t know about your setup but that generation time depends on so many things. For example the exact model(s) you’re using. For OpenAI TTS, that could also be the HD model (which takes longer). 11labs also has 2 model families: standard and turbo with different models.

I always recommend Text to Speech Models and Providers Leaderboard | Artificial Analysis for a detailed comparison!

aprendendo.next · December 21, 2024, 1:03pm

Elevenlabs is surely in the vanguard in terms of quality, but I personally have greater expectatives with openai because it is more cost effectively and offers a greater variety of services in their API.
That said, there is a lot of room for improvement on openai TTS services, and I hope they focus on that because one bottleneck of developing interactive services is that no matter how good the AI response is, people will still judge it by the quality of the voice or the capability to correctly transcribe speech audio.

brunoj · March 5, 2025, 2:19pm

I found Lemonfox.ai to be great alternative to Elevenlabs and OpenAI TTS. It’s quite fast, offers an OpenAI and Elevenlabs-compatible API and is much cheaper than both of them.

merefield · March 5, 2025, 2:23pm

Interesting - based in Germany!

cnd · March 31, 2025, 11:49am

latency means delay.
“Average generation time…” is completely irrelevant.
Nobody cares how long it takes to convert text to audio, the only thing important is how long between sending the first word, and getting back the start of the audio. You know - the latency …

They advertise “2s to 4s” for target times. Which is really weird. Even the google non-streaming API gives 400ms or less - 10 times faster - and that’s not even their streaming endpoint - you get the entire sentence audio back in one go, before you can talk it. 400ms after you sent the text…

_j · March 31, 2025, 12:26pm

And you would get poorer transcription than a language model trained on long context of the entire passage to come.

I picked up a pair (of apples)
I picked up a pear (and some apples)
I picked up au pair (Jenny from the agency)

Topic		Replies	Views
How does ElevenLabs or Deepgram realtime voice agents work as good as OpenAI Realtime API? Community realtime	3	1592	February 26, 2025
TTS API service usability API tts	17	6982	December 16, 2023
Calling TTS from a Swift app API swift	9	2731	April 13, 2024
How can chatgpt voice response so fast? API	5	3691	May 17, 2024
TTS API Speed and Quality Issues API api , tts	5	3904	February 6, 2024

Eleven labs seem to be much faster than Open AI in text to speech (tts)

Related topics