How to use verbal commands with the new TTS API?

mojave · March 23, 2025, 1:18am

I have been testing TTS on openai.fm and I’m trying to get verbal commands to work like [yelling] and [whispering].

Sometimes, they work great, but not all the time. I have tried using closing tags like [/yelling] and [/whispering] and that too has mixed results. This would be my preferred structure.

Has anyone tested this thoroughly and come up with a good solution?

Finally, how are people using ‘sigh’ and ‘ugh’ and ‘mmm mm’ and so on? I have seen some great demos of that but when I try it is hit or miss.

_j · March 23, 2025, 2:19am

Try
<direction: speaker whispers>

I’ve tried to get interjections or alterations by altering the style of guidance text with creativity on openai.fm playground, but there was ignoring in most generations.

If you want a stronger affect to language, you might need to pass the input text through a onomatopoetic character text modifier.

Stylizing this input speech:

Hi. I’ve mostly been reading manga recently. I find it quite interesting. Sometimes I also play games, mostly role-playing games. I’m also trying to learn coding, but it’s somewhat challenging. Additionally, I started watching anime. I suppose that’s evident. Sorry if I’ve been rambling. That’s about all, I think.

With the text as-is:

With the text transformed by AI to produce the Twained version:

Or a different case, cranked up to 10:

Topic		Replies	Views
Did OpenAI just make a new AI Voice? API	7	2998	May 16, 2024
Audio Models in the API - live stream at 10 AM PT API	15	604	March 29, 2025
TTS: add emphasis to one word in spoken text API speech	11	2269	June 30, 2024
(ENTER, LAUGHING) How do I give the TTS engine 'stage direction' API	0	167	October 15, 2024
How to get gpt-4o-realtime-preview to be more emotive? API	2	316	October 7, 2024

How to use verbal commands with the new TTS API?

Related topics