Language error in TTS api

, ,

Hi guys, at the company I work for we are using openai’s TTS API, and I have the following problem:

in a text in Portuguese, if there is a link, part of the audio is in Portuguese and the URL is spoken in English.

I checked the TTS API, there is no way to force the language, can anyone help me?

The API voices are optimised to English, and even when talking in other languages, such as Portuguese, an American accent remains, sometimes with more intensity.

I just tested it, with such a prompt:

'Olá, amigos da Rede Globo de Telecomunicações da TV Brasileira! Com muito orgulho, apresento um novo negócio. Acesse www.yourlifestyle.com/greetings_to_everybody_here_my_dear.html para maiores informações.'

And the accent sounds reasonable when the voice speaks the URL.

I have the same issue while trying to generate text for a list of German terms. Even some distinctly German words are pronounced with an American accent.

The inability to specify language is a weird oversight. There exist countries outside of the United States where people speak other languages.

It’s not really that weird for a single AI voice personality to have one language skill.

Consider the voice actor that does the training set creation: reportedly, exhausting repetitions of labeled data in a particular style, producing the most boring of audio books ever, the end result of which is having your career replicated by a robot. Now find native-level polyglots to do that work. And the AI specialists that can prepare and understand the data set, and come up with a strategy for distinguishing it within training and inputs.

As a user trying to solve a problem, a multi-language model not letting you specify the language is an oversight, even if the technical explanation is reasonable.

My use case was to generate pronounciations for a list of words in a glossary. The texts are too short to guess the language, so I must specify it myself.

OpenAI and ElevenLabs both tried and failed to guess the language. Google’s Text-to-Speech API worked correctly, and the SSML syntax allowed for minute corrections for terms like “B2B” and “ALG II”.

I wish that this was made clearer before I added credits to my OpenAI account. On the bright side, the docs and the API are fantastic, so I’m sure I’ll have another use for the credits.