How to use verbal commands with the new TTS API?

I have been testing TTS on openai.fm and I’m trying to get verbal commands to work like [yelling] and [whispering].

Sometimes, they work great, but not all the time. I have tried using closing tags like [/yelling] and [/whispering] and that too has mixed results. This would be my preferred structure.

Has anyone tested this thoroughly and come up with a good solution?

Finally, how are people using ‘sigh’ and ‘ugh’ and ‘mmm mm’ and so on? I have seen some great demos of that but when I try it is hit or miss.

Try
<direction: speaker whispers>

I’ve tried to get interjections or alterations by altering the style of guidance text with creativity on openai.fm playground, but there was ignoring in most generations.

If you want a stronger affect to language, you might need to pass the input text through a onomatopoetic character text modifier.

Stylizing this input speech:

Hi. I’ve mostly been reading manga recently. I find it quite interesting. Sometimes I also play games, mostly role-playing games. I’m also trying to learn coding, but it’s somewhat challenging. Additionally, I started watching anime. I suppose that’s evident. Sorry if I’ve been rambling. That’s about all, I think.

With the text as-is:

With the text transformed by AI to produce the Twained version:

Or a different case, cranked up to 10:

1 Like