[Text to Speech API] Chinese TTS unreliable and unusable


I ask the API to TTS this text:


After trying about 100 times, I get the following results:

  • Half of the time, the audio is unusable: gibberish/glitched audio, almost like if it was a weird mix of english and chinese at the same time
  • 20% of the time, the result is decent but it is missing some words
  • 30% of the time, the result is good.

Tested with tts-1-hd quality and alloy voice.

I’d like to keep this thread up to keep track of the progress if any in the future.

Can we specify the language in the API request at some point in the future? Will it help?

The issue is still present, makes the API useless for production.

I thought some openAI staff was reading some of the posts! I guess I posted in the wrong place. Thanks for the info

Chinese is too fast, you need to slow the audio speed from 1 to 0.95%
This phenomenon seems to occur in languages other than English, likely Asian languages.

It appears that using tts-1 instead of tts-1-hd results in fewer issues. In Japanese, there were hardly any such issues with tts-1.

I hope this helps some of you!

here to add Italian to the discussion. Same issue. Sometimes TTS just comes back with nonsense gibberish (the written translation is perfect). I’m using the “shimmer” voice, and I can confirm what mentioned above that the tts-1-hd model does considerably MORE gibberish translations.
Slowing it down doesn’t reduce the issue at all.