Text To Speech (tts-1) dropping numbers when reading numbered lists

evardion · November 25, 2023, 8:11pm

I’m using the Text To Speech API tts-1 and it’s working quite well, however when I try to make it read out lists of items, it occasionally doesn’t read the number at the beginning of the item and it often doesn’t pause between saying the number and the word after it.

for example:

Sure! Here are 5 fruits:

Apple

Banana

Orange

Strawberry

Grape

Is read aloud as.

Sure! Here are 5 fruits:

Apple

Banana
3 Orange
4 Strawberry-Grape

I have tried this with the voices Echo and Shimmer and it seems to happen almost every time you get the model to read out a list.

Are there any tips for making the model do a brief pause between the number and the first word?

Note: this forum doesn’t allow me to upload mp3 files but this bug is fairly easy to replicate. just get the model to read out a list of items produced by ChatGPT.

anon10827405 · November 25, 2023, 8:14pm

Pausing is traditionally hard with TTS.

Try adding some “…” or even “-” after each listed item.

Sure! Here are 5 fruits:

   1. Apple...
   2. Banana...
   etc...

This is from ElevenLabs docs but I believe it carries over (they have their own syntax for handling pauses now as well)

These options are inconsistent and might not always work. We recommend using the syntax above for consistency.

One trick that seems to provide the most consistence output - sans the above option - is a simple dash - or the em-dash —. You can even add multiple dashes such as -- -- for a longer puase.

"It - is - getting late."

Ellipsis ... can sometimes also work to add a pause between words but usually also adds some “hesitation” or “nervousness” to the voice that might not always fit.

I... yeah, I guess so..."

evardion · November 25, 2023, 8:43pm

Update: It seems like the model will still drop numbers if even if you put … after each line item which will still cause two line items to be read as a single item.

After some experimentation, it seems like this is the best workaround that I could come up with while still having the audio sound fairly natural.

Sure! Here are 5 fruits:

One: Apple…
Two: Banana…
Three: Orange…
Four: Strawberry…
Five: Grape…

Using words instead of numbers seems to increase the chance that they will be read aloud by the model but does not eliminate the problem entirely.

tdm146929 · January 14, 2025, 2:27pm

$14,282,414,085.00
read it now pjrnouwnoruen

Topic		Replies	Views
Dropping Numbers With TTS API while Generating Speech Bugs api , tts	3	824	March 19, 2024
Issue with Incomplete Audio Output Using OpenAI's tts-1 Model API tts	2	897	May 31, 2024
[Realtime API] Audio Output Numbers Wrong Bugs realtime	3	324	March 17, 2025
Huge problems with TTS API Bugs tts	4	1891	May 27, 2024
/audio/speech: truncated audio for some single word strings Bugs api , tts	6	1496	December 1, 2023

Text To Speech (tts-1) dropping numbers when reading numbered lists

Related topics