Voice Engine - Navigating the Challenges and Opportunities of Synthetic Voices

anon22939549 · March 29, 2024, 5:23pm

Summary by AI:

• OpenAI introduces Voice Engine, a text-to-speech model that generates natural-sounding speech from a single 15-second audio sample, resembling the original speaker’s voice.

• The technology has shown promise in various applications, including reading assistance, content translation, improving essential service delivery, supporting non-verbal individuals, and helping patients recover their voice.

• OpenAI emphasizes the responsible deployment of synthetic voices due to potential risks, especially in an election year, and engages with diverse partners to incorporate feedback and ensure ethical considerations.

• Usage policies prohibit impersonation without consent, require explicit consent from original speakers, and mandate clear disclosure to audiences about AI-generated voices.

• Voice Engine employs safety measures like watermarking and proactive monitoring, while advocating for voice authentication experiences and a no-go voice list to prevent misuse.

• OpenAI sees Voice Engine as an opportunity to explore the technical frontier and share advancements in AI, aligning with their commitment to AI safety.

trenton.dambrowitz · March 29, 2024, 5:30pm

Just saw it and was about to post, you beat me to it!

Super cool stuff, almost out of the uncanny valley in my opinion.

I find it very intriguing that they’re doing the same thing here as they did with Sora, basically a “hey regulators and society, hurry up and get ready cause this stuff is coming”.

anon10827405 · March 29, 2024, 6:05pm

Oh god the Spanish is like listening to someone who doesn’t speak Spanish try to speak it

Pretty neat though

anon22939549 · March 29, 2024, 6:34pm

I noticed that too.

I think ElevenLabs is a little ahead in this regard, but as I’ve said 1,000 times before, this is the worst it will ever be.

One application for this technology that really excites me is in the area of film-localization. Being able to redub actors in their own voices.

Then it’s a very short jump to re-mapping their mouth to match the new language dialog…

dcarl661 · April 2, 2024, 3:07pm

Hi, Any timeframe when this api will be available to learn?

trenton.dambrowitz · April 2, 2024, 3:19pm

At the same time, we are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse. We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities. Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.

I highly recommend reading through blog posts like this.

There is no time-frame and they are being very cautious. The article implies that if the risks are high enough and cannot be resolved they won’t release it.

anon10827405 · April 2, 2024, 3:37pm

Which to me is very strange considering that there are both commercial and open-source solutions that can do what they’re offering.

trenton.dambrowitz · April 2, 2024, 3:40pm

I assume you’re referring to the ElevenLabs API, and I agree it is a bit odd.

It’s a very clever way though of positioning yourself as the responsible AI company. I highly doubt they’ll release it before the US election.

anon10827405 · April 2, 2024, 3:47pm

And also

I’m leaning this way. But there’s so many times one can use the “we’re holding onto to it because ours is too dangerous” shtick. As one who has older brothers I have been tricked into this tomfoolery before

In February, artificial intelligence research startup OpenAI announced the creation of GPT-2, an algorithm capable of writing impressively coherent paragraphs of text.

But rather than release the AI in its entirety, the team shared only a smaller model out of fear that people would use the more robust tool maliciously — to produce fake news articles or spam, for example.

vb · April 2, 2024, 3:56pm

Just wanted to point out that GPT-2 has been the go to tool for “premium spam” that got me into this whole AI thing.
Back then it was just a expensive tool that needed extra attention but the results did stand the test of time in many cases.

With regards to why they are not releasing a TTS model that is seemingly not SOTA: I suppose it’s a rather unnecessary addition to the risk surface.
I mean, we haven’t seen the New York Times sue the whole LLM market because they are all using NYT articles. They sue OpenAI because that’s where the money is at.

hebaahomed · August 16, 2024, 9:18pm

Everything get a negative aspect

Topic		Replies	Views
Will the API for the New Voice Be Released Separately? API	4	3005	September 3, 2024
New OpenAI Trademark: VOICE ENGINE Community trademark	10	3313	March 28, 2024
OpenAI Voice Engine and Safety Research Community safety , voice-engine	1	429	June 18, 2024
Official Blog Post: How OpenAI is approaching 2024 worldwide elections Community in-the-news	1	2593	January 16, 2024
TTS API service usability API tts	17	7010	December 16, 2023

Voice Engine - Navigating the Challenges and Opportunities of Synthetic Voices

Related topics