I have been testing TTS on openai.fm and I’m trying to get verbal commands to work like [yelling] and [whispering].
Sometimes, they work great, but not all the time. I have tried using closing tags like [/yelling] and [/whispering] and that too has mixed results. This would be my preferred structure.
Has anyone tested this thoroughly and come up with a good solution?
Finally, how are people using ‘sigh’ and ‘ugh’ and ‘mmm mm’ and so on? I have seen some great demos of that but when I try it is hit or miss.
I’ve tried to get interjections or alterations by altering the style of guidance text with creativity on openai.fm playground, but there was ignoring in most generations.
If you want a stronger affect to language, you might need to pass the input text through a onomatopoetic character text modifier.
Stylizing this input speech:
Hi. I’ve mostly been reading manga recently. I find it quite interesting. Sometimes I also play games, mostly role-playing games. I’m also trying to learn coding, but it’s somewhat challenging. Additionally, I started watching anime. I suppose that’s evident. Sorry if I’ve been rambling. That’s about all, I think.
With the text as-is:
With the text transformed by AI to produce the Twained version: