SORA: wait but does it come with sound?!

Bestbubbldev · February 17, 2024, 4:40am

The primary concern is how to incorporate sound into the videos we create. I’m not referring to a soundtrack or background music, but rather scenarios like a dialogue scene or a scene where a character is singing.

Manually syncing voice and audio with lip movements would be a huge headache. Are there any existing applications that can synchronize audio and video?

How do you plan to tackle this problem?

_j · February 17, 2024, 5:12am

The way this kind of algorithm works is the AI uses both prompt and understanding of natural motions of labeled and learned objects to continue to step through likely progress of a video scene, powered by machine learning.

It appears superb, but is not unique:

Temporal specificity doesn’t appear to be a target.

If you have the AI elucidate on a singer in a jazz club, you’ll have to write your own song to come up with whatever is output.

Don’t expect this to be an AI newscaster…

Don’t expect to see this in the wild any time soon.

anon22939549 · February 17, 2024, 6:07am

anon22939549 · February 17, 2024, 6:27am

Maybe… AI music has come a long way in the last two years. OpenAI smartly (at least not publicly) hasn’t delved into music generation. But, I’m sure you could get at least a rendition of a song to lay over a video. It would be absolutely trash right now—humans are incredibly sensitive to really minor audio/video sync issues^{https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.1359-0-199802-S!!PDF-E.pdf].

I absolutely forsee models coming in the future that target video-to-audio and audio-to-video. That just a natural progression and we already have huge training sets available.

Then I imagine it’s only a very short amount of time before sometime inevitably pits them against each other with the goal of convergence.

I think you’re right, but I think we need to qualify what soon means in this context.

I think we’ll^[1] be able to generate text-to-AV that’s mostly "good enough* within 5-years but definitely within 10.

But, I also think that, if you don’t need a monolithic model to do it in one go, we can do these things today—with significant post-processing and editing.

It’s an exciting and scary time…

Meaning humanity. ↩︎

VeitB · February 17, 2024, 8:53am

Take a look. It’s ‘old’ but it exists.

https://openai.com/research/jukebox

_j · February 17, 2024, 9:01am

ezgif-6-76af43d06b

h.alesso · February 17, 2024, 10:11pm

I think D-ID.com does a pretty good job of audio sync with avatars. True the avatars are usually just small motion heads, but the lips sync is good. This same tech could probably be incorporated in Sora. Don’t you think?

curt.kennedy · February 17, 2024, 10:22pm

Yeah exactly … the OpenAI Jukebox.

Wondering if they could generate another model, that after the video is rendered, you feed the video and a prompt describing the desired soundtrack to the video, and this other model would generate the sound and sync it to the video, creating another video with this prompted AI soundtrack in it.

Agha.khan · February 18, 2024, 5:34am

I think sora is still in its infancy and think of the possibilities in coming days. I think it will be a complete package with lip sync and dialogue prompting in a single prompt.

Agha.khan · February 18, 2024, 5:37am

Thanks for the share, I don’t know how I missed it

curt.kennedy · February 18, 2024, 5:54am

Yes, all early days … but think of these scenarios as “prompts”:

Input a video, without sound. Get back the same video with sound.

Option 1 - Infer the soundtrack from the video, no text prompt
Option 2 - Infer the soundtrack from the joint prompt/video information

Input a sound, no video. Get back a video that corresponds to the sound.

Option 1 - Infer the video from the sound, no text prompt
Option 2 - Infer the video from the joint prompt/sound

But think of the different permutations. You could input sounds, get back videos. Input videos, get back sounds. You could use the joint information to sync video/sound together. And control both with a text prompt as well.

Lot’s of permutations.

Right now, Sora is input text, get a video without sound. Adding sound and syncing it back to the video is a next logical step.

Having this a separate rendering steps would give creators more control. For example, you like the video, but need to iterate on the sound a bunch to dial it in. Or you love the sound, but need to iterate on the video side.

If you are limited to doing both in one pass, you risk losing both since you are changing two variables at the same time.

videoaierc · February 18, 2024, 8:37am

This would be incredible, plus it will change the world we know. We’ll see what happens from now on. I’m waiting to do something amazing with Sora.

dignity_for_all · February 18, 2024, 9:42am

I understand the excitement, and it seems like a fantastic technology, but I think it’s best not to consider it as something urgent, and I don’t know when it will be generally available.
Currently, access is exclusive to Red Teamers, and there is no waiting list.
We’ll have to be patient for a while.

It may remind you of a leader from a European country who introduced one popular policy after another. Despite being ridiculed as a butterfly or a soap bubble, they only ended up disappointing the people.

tradingfhk · February 18, 2024, 9:00pm

I started seeing fake SORA scams on the internet.

N2U · February 18, 2024, 9:04pm

Yeah there will probably be more of those

If you see any scams out there, you can send a link to me or one of the other moderators, we do report these on a regular basis.

JRMazarri · February 19, 2024, 12:35am

Everyone share your thoughts on Sora! And how it can help you!?
I can’t wait to use it to generate marketing videos for my clients, automating the whole process for video content creation for business brands! I’m sure a lot of people already have so many ideas on how it can help them! If you are open to sharing, I would love to build and teach you how to use this ai automation in a way that helps you, imagine running it while you sleep… You can find me on all socials with this same name.

I am such an AI ethuist, I started building ai automations with open ai over 2+ years ago. And I tell you the different applications and use case scenarios are so beneficial for businesses of all kinds even small businesses! A lot of businesses today are so outdated, and not automating their whole process… which is very unfortunate… Utilizing AI to automate your business, can cut so much cost saving your business $1000’s! You can then turn around and use that money to grow your company or run paid ads! Business Owners do not understand that yet… One day they will. And I am excited to be a part of it!

HAPPY AI! everyone

JRMazarri · February 19, 2024, 1:05am

Wow I am just so amazed with Open ai, on Sora video generator! I can’t wait to make marketing content and movies! Who with me?!

timeislight · February 19, 2024, 1:22am

Of course @JRMazarri I am with you! ))

timeislight · February 19, 2024, 1:37am

Hello, do you know this other incredible news? Yet another AI revolution! :

minjaben · February 20, 2024, 4:09pm

Great, no need for a sound designer! Guess I’ll just go look for another career that isn’t being encroached by AI progress?
Or find a way to use it? I don’t even know anymore.

Topic		Replies	Views
Is Sora for videos ready for use? How will you use it? Community sora	96	190475	February 17, 2024
Using Sora to create new musical instruments and simulate what it would sounds like with true acoustical physics Community sora	1	1173	February 28, 2024
Realistic text to video? Possible? Coming soonest? Community chatgpt	13	37648	February 17, 2024
OpenAI Sora Similar Solutions Community gpt-4 , api	5	6587	November 28, 2024
Today is Day 3 of Shipmas 2024: Let's watch it together! Community shipmas	49	6062	December 10, 2024

SORA: wait but does it come with sound?!

Hello, do you know this other incredible news? Yet another AI revolution! :

Related topics