Some highlights I found (which are not present in the trademarks of GPT-4/5 or Whisper):
“processing voice commands, and converting between text and speech”
“computer software for creating and generating voice and audio outputs based on natural language prompts, text, speech, visual prompts, images, and/or video”
“computer software for building digital voice assistants”
“computer software for generation of audio and/or voice in response to user prompts”
“computer software for use as an application programming interface (API)”
“computer software development tools for the development of voice service delivery and natural language understanding technology across global computer networks, wireless networks, and electronic communications networks”
I dunno- I think the strategy is to throw enough mud at the wall to see what sticks.
Since the cost is almost negligible for them, they’ll first try a standard character claim, and if that doesn’t go through they’ll just go for a special format claim. Some stuff is bound to make it through. Whether on legitimate grounds or through clerical oversight.
I’m really looking forward to this. I have been using the hands-free while driving and it’s really nice.
My mother is also legally blind and calls me to help her find the right spice. I would love for her to be able to have an Assistant in her life to help her find things and read back recipes that she’s found.
I know there’s Be My Eyes or whatever but it low-key just does not work.
All the current big names such as Google Home are utter trash bordering on abandon ware.
I can’t wait to see what they’re cooking. It does feel like OpenAI is going towards the physical side of things soon. Life-like robots when?
My best guess is that it could be for a voice chat completion engine/model. This would drastically reduce the time between user speaking and the audio response received, which right now involves transcription, chat completion, and finally TTS.
A couple of thing that continue to worry me about OpenAI:
Ongoing secrecy around roadmap, secrecy around datasets used. I get that they choose to interpret “Open” as not meaning open source, and that’s fine, but what is exactly open here? Almost nothing. They don’t do anything different than any non-open commercial service. I feel that is bound to lead to problems eventually.
It’s a global service for a small world but the level of American bias in the products is quite astounding. For example, all the current voices are very American (except for a UK one). Even just with English there are many accents around the world. I really hope that changes in the new voice projects and that the voice choices are completely customisable and inclusive.