The primary concern is how to incorporate sound into the videos we create. I’m not referring to a soundtrack or background music, but rather scenarios like a dialogue scene or a scene where a character is singing.
Manually syncing voice and audio with lip movements would be a huge headache. Are there any existing applications that can synchronize audio and video?
The way this kind of algorithm works is the AI uses both prompt and understanding of natural motions of labeled and learned objects to continue to step through likely progress of a video scene, powered by machine learning.
It appears superb, but is not unique:
Temporal specificity doesn’t appear to be a target.
If you have the AI elucidate on a singer in a jazz club, you’ll have to write your own song to come up with whatever is output.
Don’t expect this to be an AI newscaster…
Don’t expect to see this in the wild any time soon.
Maybe… AI music has come a long way in the last two years. OpenAI smartly (at least not publicly) hasn’t delved into music generation. But, I’m sure you could get at least a rendition of a song to lay over a video. It would be absolutely trash right now—humans are incredibly sensitive to really minor audio/video sync issues^{https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.1359-0-199802-S!!PDF-E.pdf].
I absolutely forsee models coming in the future that target video-to-audio and audio-to-video. That just a natural progression and we already have huge training sets available.
Then I imagine it’s only a very short amount of time before sometime inevitably pits them against each other with the goal of convergence.
I think you’re right, but I think we need to qualify what soon means in this context.
I think we’ll[1] be able to generate text-to-AV that’s mostly "good enough* within 5-years but definitely within 10.
But, I also think that, if you don’t need a monolithic model to do it in one go, we can do these things today—with significant post-processing and editing.
I think D-ID.com does a pretty good job of audio sync with avatars. True the avatars are usually just small motion heads, but the lips sync is good. This same tech could probably be incorporated in Sora. Don’t you think?
Wondering if they could generate another model, that after the video is rendered, you feed the video and a prompt describing the desired soundtrack to the video, and this other model would generate the sound and sync it to the video, creating another video with this prompted AI soundtrack in it.
I think sora is still in its infancy and think of the possibilities in coming days. I think it will be a complete package with lip sync and dialogue prompting in a single prompt.
Yes, all early days … but think of these scenarios as “prompts”:
Input a video, without sound. Get back the same video with sound.
Option 1 - Infer the soundtrack from the video, no text prompt
Option 2 - Infer the soundtrack from the joint prompt/video information
Input a sound, no video. Get back a video that corresponds to the sound.
Option 1 - Infer the video from the sound, no text prompt
Option 2 - Infer the video from the joint prompt/sound
But think of the different permutations. You could input sounds, get back videos. Input videos, get back sounds. You could use the joint information to sync video/sound together. And control both with a text prompt as well.
Lot’s of permutations.
Right now, Sora is input text, get a video without sound. Adding sound and syncing it back to the video is a next logical step.
Having this a separate rendering steps would give creators more control. For example, you like the video, but need to iterate on the sound a bunch to dial it in. Or you love the sound, but need to iterate on the video side.
If you are limited to doing both in one pass, you risk losing both since you are changing two variables at the same time.
I understand the excitement, and it seems like a fantastic technology, but I think it’s best not to consider it as something urgent, and I don’t know when it will be generally available.
Currently, access is exclusive to Red Teamers, and there is no waiting list.
We’ll have to be patient for a while.
It may remind you of a leader from a European country who introduced one popular policy after another. Despite being ridiculed as a butterfly or a soap bubble, they only ended up disappointing the people.
Everyone share your thoughts on Sora! And how it can help you!?
I can’t wait to use it to generate marketing videos for my clients, automating the whole process for video content creation for business brands! I’m sure a lot of people already have so many ideas on how it can help them! If you are open to sharing, I would love to build and teach you how to use this ai automation in a way that helps you, imagine running it while you sleep… You can find me on all socials with this same name.
I am such an AI ethuist, I started building ai automations with open ai over 2+ years ago. And I tell you the different applications and use case scenarios are so beneficial for businesses of all kinds even small businesses! A lot of businesses today are so outdated, and not automating their whole process… which is very unfortunate… Utilizing AI to automate your business, can cut so much cost saving your business $1000’s! You can then turn around and use that money to grow your company or run paid ads! Business Owners do not understand that yet… One day they will. And I am excited to be a part of it!
Great, no need for a sound designer! Guess I’ll just go look for another career that isn’t being encroached by AI progress?
Or find a way to use it? I don’t even know anymore.