Can I choose the TTS language?

Hi,

while the output of the new Text-to-Speech API works really well in English, in German the output sounds like an American that does speak German really really well. :slight_smile:
Will there be an option to explicitly set the output language?

Cheers,
Marc

11 Likes

I have the same issue for Chinese, it sounds like an american who just learnt chinese (which is quite funny)

But the bigger issue is that it will produce giberish 50% of the time. I made a specific post about it but I cannot link it here somehow.

Same issue in Dutch. It’s still quite bad, it sounds similar to the bark model but better, just not good enough to be put in production for Dutch. I see why they didn’t add samples of multilingual audio in the docs. It’s quite unfortunate too because at the moment, Eleven Labs is the best but it’s so overpriced, like 0.30$/1000 characters vs 0.03$/1000 characters for OpenAI’s TTS-HD. Was hoping I could switch to OpenAI’s TTS when I heard about this announcement. Hope it will get better soon.

1 Like

In french, I would say the voice has a strong american (sometimes canadian) accent too. However it clearly detects the language and adapts.

Same issue with Esperanto, some letters are pronounced wrong (c like k) and the special characters like ĉ, ĝ, ĥ, ĵ, ŝ and ŭ are completely skipped some times.

But all in all surprisingly good performance for Esperanto.

I have noticed tghat thee output accent depends solely on what input language chatgpt is set up with. So if you like german with a french accent you can change your input language to french. That is at least the relation I have observed when using the ipad version of chatgpt to train languages. I assume it is similar to this problem. So all in all we need a parameter to control the accent

OpenAI TTS automatically recognizes the input language and generates the speech in the input text’s language. The voice will be the same!

Here’s an example of the same voice 'onyx' generating the world’s top ten languages:

  • English
  • Mandarin
  • Hindi
  • Spanish
  • French
  • Arabic
  • Bengali
  • Russian
  • Portuguese
  • Indonesian

I can only validate that German, French, Spanish, and English are almost perfect without accents. I haven’t tried other less popular languages (e.g., Catalan).

3 Likes

I’ve heard the American accent thing several times now, I’m sure that will improve as the TTS model gets more data to work from, still amazed it works at all to be fair!

It would be absolutely amazing to have the ability to specify the language. In my experience, I encounter two scenarios where this is necessary:

  • I need to TTS short texts (like 2, 3 words), and in these cases, it’s more challenging for the model to accurately detect the language. For instance, when I tried to synthesize speech for “incrivelmente poderoso” (Portuguese words), it often pronounced them as if they were English words.
  • With longer texts, certain Portuguese words are pronounced as though an American was speaking Portuguese with a slight accent.
1 Like

This is obviously required since some languages share the same words, so even openAI cannot guess without context

1 Like

It is not yet prepared for production in languages other than English. Apart from the American accent, test it with times and dates to observe its limitations. I found it disappointing when used in Portuguese. I hope the quality improves rapidly.

2 Likes

It would be amazing if the language could be selected.

Sometimes it reads spanish as if it were a mixture of a completely unrelated language like italian (not sure because it sounds more like gibberish), and spanish is supposed to be one of the languages of best performance after english.

Perhaps it would be interesting to support some sort of syntax like SSML, to allow more control over basic elements like language and rate of speech.

please provide a way to choose TTS language. I agree with all previous comments

1 Like

Commenting because I’m also noticing this issue with Belarusian. It seems that the model doesn’t know about the differences in pronunciation between East Slavic languages (Belarusian/Ukrainian/Russian). For example, г is pronounced ‘he’ as in ‘hem’ in Belarusian/Ukrainian however OpenAI TTS uses the Russian pronunciation most of the time (‘ge’ as in ‘get’).

Also, I took am noticing some requests coming back that sounds completely garbled or unrelated to the text submitted. Re-submitting the request sometimes works. Sometimes I find that it helps to add a period at the end of the text if it is missing punctuation.

1 Like

I tried today and there isnt a way to set italian language, there is some different way?

Ha ha ha. Oh wow. Sorry, just listened to Russian version. It speaks as American who is trying to read Russian without much trying. Spanish is bad too - I would say the accent is quite heavy, and I didn’t like his French.

You have strong R in both German and Russian, and strong L in Russian (but never in German). This voice can do neither, so it sounds like someone with seriously impaired speech. It’s absolutely not good for production.

Hello, I’m using TTS for several langage and sometimes when words are quiet the same in both langage it goes english. That’s good to have a langage recongnition but it should have a way to set it up.

1 Like

+1.

Sometimes the model identifies the language incorrectly and therefore, pronounces words incorrectly.

2 suggestions:

1. Improve language detection.
2. Offer “language” as an optional parameter, which disables auto language detection.

2 Likes

I’m Spanish and the Spanish version sounds like an american trying to talk in Spanish.

2 Likes

Thanks, Ignacio! That’s a valuable piece of feedback. German works pretty well (not the usual AE accent).