Can I specify the language of TTS voices with a bit ambiguous input?

Hi all! I am now playing with the TTS APIs. I would like to generate voices from Japanese input sentences.

But as we know, Japanese has a lot characters in common with the Chinese sentences. Sometimes, in fact, they looks exactly same. I found that the TTS API then outputs sometimes a mix of Japanese and Chinese or even solely Chinese, while I want a Japanese output. Is there any way to tune the parameters a bit more to handle this problem? Or this is what we have and we need to wait a bit for future additional functionalities?

Thanks in advance for answering this!


Currently it is not possible to set the language for the TTS endpoint.

I would give it a shot with a workaround:
Add a short sentence to the beginning of your text that tells the model that the language is Japanese only.

β€˜The following text is Japanese.’

Then add a additional pause between this first sentence and the actual text:.

After retrieving the speech from the API split the audio at the first occurrence of silence, for example using pydub.

It would be better if we could prompt the model to use a specific language explicitly.

1 Like