Does anyone know where I could find information on the TTS models’ talking speeds in word per minute? Thank you!
Hey there!
So, you’re probably not going to find anything that’s words per minute, because typically, text is processed (streamed) in different “chunks”. Meaning, it would return speech after it processes x amount of text. Doing it per word would end up slowing everything down to the point where it’s unusable.
What you could do is process text, and then speed up the resultant audio file by x amount to increase the speed of the audio.
Interesting question!
I don’t think there is any official information about this that is publically available.
It would be interesting to test out, grab some standard bit of text and generate speech files for all the voices and see if there is any appreciable difference.
So… I did just that.
Using two standard text passages, the “Grandfather Passage” and “Rainbow Passage” I tested all of the available voices with the default speed.
Voice | Passage | Time | WPM |
---|---|---|---|
alloy | grandfather | 0.7432 | 177.6103 |
rainbow | 1.8536 | 177.4924 | |
echo | grandfather | 0.7388 | 178.6681 |
rainbow | 1.8508 | 177.7610 | |
fable | grandfather | 0.7452 | 177.1337 |
rainbow | 1.8716 | 175.7854 | |
nova | grandfather | 0.7384 | 178.7649 |
rainbow | 1.8424 | 178.5714 | |
onyx | grandfather | 0.7392 | 178.5714 |
rainbow | 1.8524 | 177.6074 | |
shimmer | grandfather | 0.7440 | 177.4194 |
rainbow | 1.8660 | 176.3130 |
So, the answer seems to be the voices are pegged at about 178 WPM, which is super fast.
Just test it on a light novel, speed for Onyx was exactly 171.62704179503572 words per minute (english): Sentences are short, at most 25 words per line.
It is also depend on language. I tested onyx
on Russian text and it has ~130 WPM.