Did OpenAI just make a new AI Voice?

matthewethan · March 23, 2024, 9:01pm

So, I saw this video showing an AI voice that sounds remarkable. https://www.youtube.com/watch?v=6Mi1jRszXIc

Last I checked (1-2 months ago), OpenAI’s API voices sound nowhere near this good.

Is this real? I couldn’t find any additional information about it online.

Maybe the voices on ChatGPT are different than what’s accessible using the API?

cyzgab · March 24, 2024, 4:04am

yea - it’s possible.

ChatGPT had a santa voice last year!

matthewethan · March 24, 2024, 8:55pm

How can I access these new voices via the API? Do you know?

_j · March 25, 2024, 3:55am

ChatGPT has different text-to-speech voice models than are available on the API.

This may be to clearly distinguish your product from their product.

To save you clicking on a tedious Youtube short, there is just a guy telling ChatGPT mobile to make pauses when it writes responses. A normal voice available in the app is used.

You can also type out umhs and ahhs when you sent text to the API, or even include cues that are often acted on instead of spoken, like [pause], or [angrily].

Then you can instruct the AI that is “speaking” to write like that … if you want to waste more user time with speech that is already plodding and laborious - and then you look and it was only two short paragraphs.

Here’s the above text, that has been “enhanced” by my instructions to GPT-4 to rewrite the TTS input.

I gave it more “like” interjections, like the ones that annoy my mom…

matthewethan · March 25, 2024, 11:54pm

I see. The second one is quite good. Is the second voice available from the API?

_j · March 26, 2024, 3:59am

Yes, I used the API, and the voices there are “Alloy” and “Nova”, respectively.

https://platform.openai.com/docs/guides/text-to-speech/voice-options

Here’s a system prompt to make GPT-4 into a speech informalizer:

Your job is to enhance the quality of the provided text, which is intended to be spoken by an AI text-to-voice service. You will make the resulting speech sound more natural and human-like, as if a human was thinking while speaking, and being recorded while speaking these words in an impromptu fashion, by including natural stop phrases, noises, “ums” and “ahs” into the text itself.

Onomatopoeic sounds that you write will be spoken aloud. They can even be approximated and will be sounded-out.

Short “stage directions” in square brackets, such as [pause], or [spoken angrily], [deep breath] can affect the style of speech without being spoken aloud.

Important: The additional vocalizations will include a LOT of interjections of the word “like” before almost anything descriptive, like how a young person talks. This use of “like” is almost every sentence, and even more common than an “um”. Other spoken phrases such as “You know”, and similar spoken-word thinking ejaculations, are also used often.

philippecoenen · May 16, 2024, 6:23pm

Hello,

I don’t fully understand where to use the prompt.
I mean, is it in the code ? And would you be so kind to tell what to do cos I don’t see anything in the documentation about it.

I thank you already.

_j · May 16, 2024, 6:42pm

API AI models use a system message that defines how the AI is supposed to behave.

This prompt above is giving the AI additional instructions to follow when it generates that language, so that it will not just be a string of predictive AI text at a linear pace, it will have more “thinking” pauses and sounds going on that it writes into the language that it outputs, tuned for the text-to-speech engine to recognize, so speech can be more chatty-sounding.

The prompt is written to alter text, but it can also just be direction to the AI.

Giving the instruction above as “AI programming” in a chat completions system message, it transforms my response to this:

[laughs softly] So, like, API AI models? They use this, um, system message that basically, like, defines how the AI is supposed to, you know, behave.

[pause] And this prompt above? It’s giving the AI, like, additional instructions to follow when it, uh, generates that language. So it’s not just gonna be, like, a string of predictive AI text, you know, going at a linear pace. It will have, like, more “thinking” pauses and sounds, um, going on that it writes into the language that it outputs. This is all, like, tuned for the text-to-speech engine to recognize, so, like, speech can be more chatty-sounding.

Oh, and, like, the prompt? It’s written to alter text, but it can also just be, um, direction to the AI. Cool, right?

In ChatGPT, you could write a similar “custom instruction” to change how the AI produces its chat.

Topic		Replies	Views
How to make GPT (Voice) allow user more time to talk before replying GPT builders gpt-4 , api	24	3827	November 26, 2024
Audio Models in the API - live stream at 10 AM PT API	15	477	March 29, 2025
TTS voices have a clear US accent API tts	11	3105	January 8, 2025
Voice differences between Realtime API and Text-to-Speech API realtime , api-realtime	1	825	January 8, 2025
New model, tts-2, any news on it? (new voice mode) API tts	9	1845	February 21, 2025

Did OpenAI just make a new AI Voice?

Related topics