• OpenAI introduces Voice Engine, a text-to-speech model that generates natural-sounding speech from a single 15-second audio sample, resembling the original speaker’s voice.
• The technology has shown promise in various applications, including reading assistance, content translation, improving essential service delivery, supporting non-verbal individuals, and helping patients recover their voice.
• OpenAI emphasizes the responsible deployment of synthetic voices due to potential risks, especially in an election year, and engages with diverse partners to incorporate feedback and ensure ethical considerations.
• Usage policies prohibit impersonation without consent, require explicit consent from original speakers, and mandate clear disclosure to audiences about AI-generated voices.
• Voice Engine employs safety measures like watermarking and proactive monitoring, while advocating for voice authentication experiences and a no-go voice list to prevent misuse.
• OpenAI sees Voice Engine as an opportunity to explore the technical frontier and share advancements in AI, aligning with their commitment to AI safety.
Just saw it and was about to post, you beat me to it!
Super cool stuff, almost out of the uncanny valley in my opinion.
I find it very intriguing that they’re doing the same thing here as they did with Sora, basically a “hey regulators and society, hurry up and get ready cause this stuff is coming”.
At the same time, we are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse. We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities. Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.
I highly recommend reading through blog posts like this.
There is no time-frame and they are being very cautious. The article implies that if the risks are high enough and cannot be resolved they won’t release it.
I’m leaning this way. But there’s so many times one can use the “we’re holding onto to it because ours is too dangerous” shtick. As one who has older brothers I have been tricked into this tomfoolery before
In February, artificial intelligence research startup OpenAI announced the creation of GPT-2, an algorithm capable of writing impressively coherent paragraphs of text.
But rather than release the AI in its entirety, the team shared only a smaller model out of fear that people would use the more robust tool maliciously — to produce fake news articles or spam, for example.
Just wanted to point out that GPT-2 has been the go to tool for “premium spam” that got me into this whole AI thing.
Back then it was just a expensive tool that needed extra attention but the results did stand the test of time in many cases.
With regards to why they are not releasing a TTS model that is seemingly not SOTA: I suppose it’s a rather unnecessary addition to the risk surface.
I mean, we haven’t seen the New York Times sue the whole LLM market because they are all using NYT articles. They sue OpenAI because that’s where the money is at.