Dropping Numbers With TTS API while Generating Speech


I’m utilizing a Text-to-Speech (TTS) API to generate speech from text that includes phone numbers and other numerical content. However, after generating the speech, some of the numbers are missing. I’ve experimented with various formats and types of input, such as:

  1. 1234567890
  2. 1-2-3-4-5-6-7-8-9-0
  3. one two three four five six seven eight nine zero
  4. 1 2 3 4 5 6 7 8 9 0

Despite trying these variations, the speech output consistently omits certain numbers. For instance, instead of saying ‘1234567890,’ it might say ‘12345670,’ and so on. I’ve generated batches of 10 files each time, and the error rate ranges from 50% to 80%, meaning that out of 10 files, 5 to 8 files are missing the numbers.

Could anyone provide insights on how to resolve this issue?

Not only numbers but sometimes it cuts off some syllables too.
Have you tried formatting the numbers with commas, periods, or ellipses?
Or even with a break line like \n?

Also, splitting a long text into smaller parts could help in a more accurate result.

  1. Use other providers for more complex/longer solution, especially with the current situation of “degraded GPT4”, like Claude 3 (the highest tier)
  2. Use other TTS (preferably your own) and train your own.
  3. Lastly, use simpler prompts and simpler words/arrangement of numbers. Mix and match, treat the AI like a child.

As stated in the discussion, I actually did separate words and numbers using commas, periods, and so on.
and Secodonly Rewriting the sentence in smaller chunks will affect how it is used in terms of delay and voice quality.