Hi, I’m working on a similar project but with a slight difference: mine involves generating video from text. The process includes taking a text input, generating audio from it using Text-to-Speech (TTS), and then using that audio along with a 3D lifelike avatar to represent or speak the text. So far, I’ve successfully implemented TTS using Coqui TTS, which is amazing for generating natural-sounding audio. However, I’m having trouble syncing the audio with the avatar’s lip movements naturally and accurately.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
How to implement Real time lip sync of avatar chatbot, powered by gpt | 27 | 17640 | January 19, 2025 | |
Channeling OpenAI api output to face with lip movements | 3 | 2360 | July 2, 2021 | |
Virtual assistant with video and audio | 0 | 157 | September 30, 2024 | |
ChatGPT API TTS streaming | 2 | 3557 | June 1, 2024 | |
Send me your questions/problems and I'll make a video answer | 10 | 2419 | October 19, 2024 |