Voice Engine - Navigating the Challenges and Opportunities of Synthetic Voices

Summary by AI:

• OpenAI introduces Voice Engine, a text-to-speech model that generates natural-sounding speech from a single 15-second audio sample, resembling the original speaker’s voice.

• The technology has shown promise in various applications, including reading assistance, content translation, improving essential service delivery, supporting non-verbal individuals, and helping patients recover their voice.

• OpenAI emphasizes the responsible deployment of synthetic voices due to potential risks, especially in an election year, and engages with diverse partners to incorporate feedback and ensure ethical considerations.

• Usage policies prohibit impersonation without consent, require explicit consent from original speakers, and mandate clear disclosure to audiences about AI-generated voices.

• Voice Engine employs safety measures like watermarking and proactive monitoring, while advocating for voice authentication experiences and a no-go voice list to prevent misuse.

• OpenAI sees Voice Engine as an opportunity to explore the technical frontier and share advancements in AI, aligning with their commitment to AI safety.

4 Likes

Just saw it and was about to post, you beat me to it!

Super cool stuff, almost out of the uncanny valley in my opinion.

I find it very intriguing that they’re doing the same thing here as they did with Sora, basically a “hey regulators and society, hurry up and get ready cause this stuff is coming”.

1 Like

Oh god the Spanish is like listening to someone who doesn’t speak Spanish try to speak it :rofl:

Pretty neat though

3 Likes

I noticed that too.

I think ElevenLabs is a little ahead in this regard, but as I’ve said 1,000 times before, this is the worst it will ever be.

One application for this technology that really excites me is in the area of film-localization. Being able to redub actors in their own voices.

Then it’s a very short jump to re-mapping their mouth to match the new language dialog…

2 Likes

Hi, Any timeframe when this api will be available to learn?

At the same time, we are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse. We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities. Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.

I highly recommend reading through blog posts like this.

There is no time-frame and they are being very cautious. The article implies that if the risks are high enough and cannot be resolved they won’t release it.

Which to me is very strange considering that there are both commercial and open-source solutions that can do what they’re offering.

1 Like

I assume you’re referring to the ElevenLabs API, and I agree it is a bit odd.

It’s a very clever way though of positioning yourself as the responsible AI company. I highly doubt they’ll release it before the US election.

2 Likes

And also

I’m leaning this way. But there’s so many times one can use the “we’re holding onto to it because ours is too dangerous” shtick. As one who has older brothers I have been tricked into this tomfoolery before :joy:

In February, artificial intelligence research startup OpenAI announced the creation of GPT-2, an algorithm capable of writing impressively coherent paragraphs of text.

But rather than release the AI in its entirety, the team shared only a smaller model out of fear that people would use the more robust tool maliciously — to produce fake news articles or spam, for example.

1 Like

Just wanted to point out that GPT-2 has been the go to tool for “premium spam” that got me into this whole AI thing.
Back then it was just a expensive tool that needed extra attention but the results did stand the test of time in many cases.

With regards to why they are not releasing a TTS model that is seemingly not SOTA: I suppose it’s a rather unnecessary addition to the risk surface.
I mean, we haven’t seen the New York Times sue the whole LLM market because they are all using NYT articles. They sue OpenAI because that’s where the money is at.

1 Like