Text-to-video generation using TTS for audio and a 3D avatar

Hi, I’m working on a similar project but with a slight difference: mine involves generating video from text. The process includes taking a text input, generating audio from it using Text-to-Speech (TTS), and then using that audio along with a 3D lifelike avatar to represent or speak the text. So far, I’ve successfully implemented TTS using Coqui TTS, which is amazing for generating natural-sounding audio. However, I’m having trouble syncing the audio with the avatar’s lip movements naturally and accurately.

2 Likes