SORA: wait but does it come with sound?!

The primary concern is how to incorporate sound into the videos we create. I’m not referring to a soundtrack or background music, but rather scenarios like a dialogue scene or a scene where a character is singing.

Manually syncing voice and audio with lip movements would be a huge headache. Are there any existing applications that can synchronize audio and video?

How do you plan to tackle this problem?


The way this kind of algorithm works is the AI uses both prompt and understanding of natural motions of labeled and learned objects to continue to step through likely progress of a video scene, powered by machine learning.

It appears superb, but is not unique:

Temporal specificity doesn’t appear to be a target.

If you have the AI elucidate on a singer in a jazz club, you’ll have to write your own song to come up with whatever is output.

Don’t expect this to be an AI newscaster…

Don’t expect to see this in the wild any time soon.



Maybe… AI music has come a long way in the last two years. OpenAI smartly (at least not publicly) hasn’t delved into music generation. But, I’m sure you could get at least a rendition of a song to lay over a video. It would be absolutely crap right now—humans are incredibly sensitive to really minor audio/video sync issues^{!!PDF-E.pdf].

I absolutely forsee models coming in the future that target video-to-audio and audio-to-video. That just a natural progression and we already have huge training sets available.

Then I imagine it’s only a very short amount of time before sometime inevitably pits them against each other with the goal of convergence.

I think you’re right, but I think we need to qualify what soon means in this context.

I think we’ll[1] be able to generate text-to-AV that’s mostly "good enough* within 5-years but definitely within 10.

But, I also think that, if you don’t need a monolithic model to do it in one go, we can do these things today—with significant post-processing and editing.

It’s an exciting and scary time…

  1. Meaning humanity. ↩︎

Take a look. It’s ‘old’ but it exists.




I think does a pretty good job of audio sync with avatars. True the avatars are usually just small motion heads, but the lips sync is good. This same tech could probably be incorporated in Sora. Don’t you think?


Yeah exactly … the OpenAI Jukebox.

Wondering if they could generate another model, that after the video is rendered, you feed the video and a prompt describing the desired soundtrack to the video, and this other model would generate the sound and sync it to the video, creating another video with this prompted AI soundtrack in it.


I think sora is still in its infancy and think of the possibilities in coming days. I think it will be a complete package with lip sync and dialogue prompting in a single prompt.

Thanks for the share, I don’t know how I missed it :slight_smile:

Yes, all early days … but think of these scenarios as “prompts”:

  1. Input a video, without sound. Get back the same video with sound.
  • Option 1 - Infer the soundtrack from the video, no text prompt
  • Option 2 - Infer the soundtrack from the joint prompt/video information
  1. Input a sound, no video. Get back a video that corresponds to the sound.
  • Option 1 - Infer the video from the sound, no text prompt
  • Option 2 - Infer the video from the joint prompt/sound

But think of the different permutations. You could input sounds, get back videos. Input videos, get back sounds. You could use the joint information to sync video/sound together. And control both with a text prompt as well.

Lot’s of permutations.

Right now, Sora is input text, get a video without sound. Adding sound and syncing it back to the video is a next logical step.

Having this a separate rendering steps would give creators more control. For example, you like the video, but need to iterate on the sound a bunch to dial it in. Or you love the sound, but need to iterate on the video side.

If you are limited to doing both in one pass, you risk losing both since you are changing two variables at the same time.


This would be incredible, plus it will change the world we know. We’ll see what happens from now on. I’m waiting to do something amazing with Sora.

I understand the excitement, and it seems like a fantastic technology, but I think it’s best not to consider it as something urgent, and I don’t know when it will be generally available.
Currently, access is exclusive to Red Teamers, and there is no waiting list.
We’ll have to be patient for a while.

It may remind you of a leader from a European country who introduced one popular policy after another. Despite being ridiculed as a butterfly or a soap bubble, they only ended up disappointing the people.


I started seeing fake SORA scams on the internet.

Yeah there will probably be more of those :sweat_smile:

If you see any scams out there, you can send a link to me or one of the other moderators, we do report these on a regular basis. :heart:


Everyone share your thoughts on Sora! And how it can help you!?
Wow I am just so amazed with Open ai, on Sora video generator! I can’t wait to make marketing content and movies! Who with me?!

Of course @JRMazarri I am with you! ))

Hello, do you know this other incredible news? Yet another AI revolution! :

Great, no need for a sound designer! Guess I’ll just go look for another career that isn’t being encroached by AI progress?
Or find a way to use it? I don’t even know anymore.

