Hi,
while the output of the new Text-to-Speech API works really well in English, in German the output sounds like an American that does speak German really really well.
Will there be an option to explicitly set the output language?
Cheers,
Marc
Hi,
while the output of the new Text-to-Speech API works really well in English, in German the output sounds like an American that does speak German really really well.
Will there be an option to explicitly set the output language?
Cheers,
Marc
I have the same issue for Chinese, it sounds like an american who just learnt chinese (which is quite funny)
But the bigger issue is that it will produce giberish 50% of the time. I made a specific post about it but I cannot link it here somehow.
Same issue in Dutch. Itās still quite bad, it sounds similar to the bark model but better, just not good enough to be put in production for Dutch. I see why they didnāt add samples of multilingual audio in the docs. Itās quite unfortunate too because at the moment, Eleven Labs is the best but itās so overpriced, like 0.30$/1000 characters vs 0.03$/1000 characters for OpenAIās TTS-HD. Was hoping I could switch to OpenAIās TTS when I heard about this announcement. Hope it will get better soon.
In french, I would say the voice has a strong american (sometimes canadian) accent too. However it clearly detects the language and adapts.
Same issue with Esperanto, some letters are pronounced wrong (c like k) and the special characters like Ä, Ä, Ä„, ĵ, Å and Å are completely skipped some times.
But all in all surprisingly good performance for Esperanto.
I have noticed tghat thee output accent depends solely on what input language chatgpt is set up with. So if you like german with a french accent you can change your input language to french. That is at least the relation I have observed when using the ipad version of chatgpt to train languages. I assume it is similar to this problem. So all in all we need a parameter to control the accent
OpenAI TTS automatically recognizes the input language and generates the speech in the input textās language. The voice will be the same!
Hereās an example of the same voice 'onyx'
generating the worldās top ten languages:
I can only validate that German, French, Spanish, and English are almost perfect without accents. I havenāt tried other less popular languages (e.g., Catalan).
Iāve heard the American accent thing several times now, Iām sure that will improve as the TTS model gets more data to work from, still amazed it works at all to be fair!
It would be absolutely amazing to have the ability to specify the language. In my experience, I encounter two scenarios where this is necessary:
This is obviously required since some languages share the same words, so even openAI cannot guess without context
It is not yet prepared for production in languages other than English. Apart from the American accent, test it with times and dates to observe its limitations. I found it disappointing when used in Portuguese. I hope the quality improves rapidly.
It would be amazing if the language could be selected.
Sometimes it reads spanish as if it were a mixture of a completely unrelated language like italian (not sure because it sounds more like gibberish), and spanish is supposed to be one of the languages of best performance after english.
Perhaps it would be interesting to support some sort of syntax like SSML, to allow more control over basic elements like language and rate of speech.
please provide a way to choose TTS language. I agree with all previous comments
Commenting because Iām also noticing this issue with Belarusian. It seems that the model doesnāt know about the differences in pronunciation between East Slavic languages (Belarusian/Ukrainian/Russian). For example, г is pronounced āheā as in āhemā in Belarusian/Ukrainian however OpenAI TTS uses the Russian pronunciation most of the time (āgeā as in āgetā).
Also, I took am noticing some requests coming back that sounds completely garbled or unrelated to the text submitted. Re-submitting the request sometimes works. Sometimes I find that it helps to add a period at the end of the text if it is missing punctuation.
I tried today and there isnt a way to set italian language, there is some different way?
Ha ha ha. Oh wow. Sorry, just listened to Russian version. It speaks as American who is trying to read Russian without much trying. Spanish is bad too - I would say the accent is quite heavy, and I didnāt like his French.
You have strong R in both German and Russian, and strong L in Russian (but never in German). This voice can do neither, so it sounds like someone with seriously impaired speech. Itās absolutely not good for production.
Hello, Iām using TTS for several langage and sometimes when words are quiet the same in both langage it goes english. Thatās good to have a langage recongnition but it should have a way to set it up.
+1.
Sometimes the model identifies the language incorrectly and therefore, pronounces words incorrectly.
2 suggestions:
1. Improve language detection.
2. Offer ālanguageā as an optional parameter, which disables auto language detection.
Iām Spanish and the Spanish version sounds like an american trying to talk in Spanish.
Thanks, Ignacio! Thatās a valuable piece of feedback. German works pretty well (not the usual AE accent).