[Text to Speech API] Chinese TTS unreliable and unusable

Example:

I ask the API to TTS this text:

这部电影我已经看完了。你的作业完成了吗?我们已经吃完晚饭了。

After trying about 100 times, I get the following results:

  • Half of the time, the audio is unusable: gibberish/glitched audio, almost like if it was a weird mix of english and chinese at the same time
  • 20% of the time, the result is decent but it is missing some words
  • 30% of the time, the result is good.

Tested with tts-1-hd quality and alloy voice.

I’d like to keep this thread up to keep track of the progress if any in the future.

Can we specify the language in the API request at some point in the future? Will it help?

1 Like

The issue is still present, makes the API useless for production.

1 Like

Hi!

This is more of a self-help community where developers as the users of OpenAI services converse with each other.
Also, all of the new features are still in beta and one should be aware of the risks for a production service even when using them for a language like English, for example.

I thought some openAI staff was reading some of the posts! I guess I posted in the wrong place. Thanks for the info

1 Like

Chinese is too fast, you need to slow the audio speed from 1 to 0.95%
Your audio TTS

This phenomenon seems to occur in languages other than English, likely Asian languages.

It appears that using tts-1 instead of tts-1-hd results in fewer issues. In Japanese, there were hardly any such issues with tts-1.

I hope this helps some of you!

here to add Italian to the discussion. Same issue. Sometimes TTS just comes back with nonsense gibberish (the written translation is perfect). I’m using the “shimmer” voice, and I can confirm what mentioned above that the tts-1-hd model does considerably MORE gibberish translations.
Slowing it down doesn’t reduce the issue at all.