I ask the API to TTS this text:
After trying about 100 times, I get the following results:
Half of the time, the audio is unusable:
gibberish/glitched audio, almost like if it was a weird mix of english and chinese at the same time 20% of the time, the result is decent but it is missing some words
30% of the time, the result is good.
tts-1-hd quality and
I’d like to keep this thread up to keep track of the progress if any in the future.
Can we specify the language in the API request at some point in the future? Will it help?
The issue is still present, makes the API useless for production.
This is more of a self-help community where developers as the users of OpenAI services converse with each other.
Also, all of the new features are still in beta and one should be aware of the risks for a production service even when using them for a language like English, for example.
I thought some openAI staff was reading some of the posts! I guess I posted in the wrong place. Thanks for the info
Chinese is too fast, you need to slow the audio speed from 1 to 0.95%
Your audio TTS
This phenomenon seems to occur in languages other than English, likely Asian languages.
According to the document description, tts-1 is optimized for speed, while tts-1-hd is optimized for quality. However, in about 30 Japanese text-to-speech tests that I conducted, tts-1-hd often read parts of the Japanese text with a strange pronunciation that was neither Japanese nor English.
Therefore, it is likely that tts-1 and tts-1-HD were trained on different datasets.
I have not confirmed whether this applies to languages other than Japanese, but which one to prefer may vary depending o…
It appears that using tts-1 instead of tts-1-hd results in fewer issues. In Japanese, there were hardly any such issues with tts-1.
I hope this helps some of you!