Eleven labs seem to be much faster than Open AI in text to speech (tts)

latency means delay.
“Average generation time…” is completely irrelevant.
Nobody cares how long it takes to convert text to audio, the only thing important is how long between sending the first word, and getting back the start of the audio. You know - the latency

They advertise “2s to 4s” for target times. Which is really weird. Even the google non-streaming API gives 400ms or less - 10 times faster - and that’s not even their streaming endpoint - you get the entire sentence audio back in one go, before you can talk it. 400ms after you sent the text…