Is it possible to add emphasis to one specific word in the text that I want to be spoken by the TTS (audio/speech) endpoint?
Let’s say I want to have the following text to be spoken:
“Are you still using this?”
The exact meaning can be very different when the emphasis is on ‘you’ or ‘still’ or ‘this’.
How can I convince TTS to put emphasis on a certain word?
It’s somewhat possible but likely not going to work reliably. From the docs:
There is no direct mechanism to control the emotional output of the audio generated. Certain factors may influence the output audio like capitalization or grammar but our internal tests with these have yielded mixed results.
Interesting …
The input of the audio/speech endpoint is a String.
How do I give italics to that endpoint?
This is the sample code that I use to call the endpoint:
from pathlib import Path
from openai import OpenAI
client = OpenAI()
speech_file_path = Path(__file__).parent / "speech.mp3"
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Today is a wonderful day to build something people love!"
)
response.stream_to_file(speech_file_path)
Or you can get creative and add another sentence to tell the model that it should put emphasis on the word:
'Jane wanted to tell everybody about today. When she adressed the crowd she put emphasis on the word ‘Today’ and then she said: “Today is a wonderful day to build something people love!” ’
When I add to the text, I do notice some changes, but I do not know if the changes are due to the tags, or because every time the resulting speech is different anyway.
And sometimes the voice actually speaks the ‘em’ !!
with my few attempts, i have noticed till now is that using certain words in CAPS helps, as also using hyphens( “-” ) helps slightly in emphasis. (Damn, that looks like a bird!)