What about to implement SSML on the new TTS API service?

I can’t find anything into the docs, even here. So, is possible (or is intended) to use SSML to fine-control the output voice?

4 Likes

I’m also very interested on SSML +1

4 Likes

I am also very interested in SSML features

2 Likes

+1 here. Since gpt models are considerably well developed so far, I think the next thing we need to catch up is the way it interacts with people.

Having the ability to control the language and speed with SSML, would allow it to properly pronounce foreign words or sentences in the middle of a sentence.

For example:

<speak>
    <voice name="en-US">
        "<lang xml:lang="fr-FR">Laissez-faire</lang>" is a French phrase that translates to "let do" or "let go."
    </voice>
</speak>
1 Like

+1 here. I also want to use TTS API with SSML.

1 Like

+1 here too.

OpenAI TTS is much better than Google Cloud in terms of sounding more natural in bi-lingual sentences, but it is lacking a few important features that Google Cloud TTS has.

Unlike Google Cloud, OpenAI:

  1. … doesn’t allow for minute voice/accent/language/pause control with SSML (or equivalent)
  2. … doesn’t offer a wide range of voices/accents in other languages – openAI voice is American-centric (i.e. no british accent. can’t make a product that caters to british audience. ditto for other languages)
  3. … would sometimes skip words in foreign language – changing speed to 0.9 helps a little bit, but at the expense of slowing down everything. Again, OpenAI lacks the minute control (via SSML) that Google Cloud offers.
2 Likes