I wanted to create a human chatbot that will listen to the questions
of users and answer it and lip of human will be synced with the answer.
so kind of real time voice converstional avatar interaction users can have.
I will be using GPT natural language converstion.
But i am not getting any solution how can i implement the real time lip sync of the avatar, what tools or models i need to use in order to achive this.
Hey we have done a hackathon project before using RealTime lip-sync solution.
neat part is that it can be integrated as a react component. no complicated game engine setup
“github dot com /BennyKok/leaked-zoom”
you can ask any questions in their discord “discord dot gg / ZXKaZq4gMR”
Hi, I’m working on a similar project but with a slight difference: mine involves generating video from text. The process includes taking a text input, generating audio from it using Text-to-Speech (TTS), and then using that audio along with a 3D lifelike avatar to represent or speak the text. So far, I’ve successfully implemented TTS using Coqui TTS, which is amazing for generating natural-sounding audio. However, I’m having trouble syncing the audio with the avatar’s lip movements naturally and accurately. I tried using Wav2Lip, but since I don’t have a GPU, the performance is poor. Could you please guide me on how to improve this?
You would have to know some python, but I’ve done this with Nvidia’s Audio2Face. They have a sample extension that takes text, sends it to one of their TTS systems (Riva or something) and streams the returned audio into Audio2Face, which animated the face of a model. I just changed it to take the text you input, send it to ChatGPT or a local model, send that result to a local XTTS2 model (you can use whatever TTS, preferable one that can stream), and then that result is streamed into Audio2Face. I made a 3D model of a guy I know, cloned his voice with XTTS2, and you could chat with him and he would speak to you.
Hi, what you have done sounds interesting. Any chance you could share the source code so that I can learn? I am new to this and trying to get my head around. Thanks in advance!
Hi, we are working on a similar kind of project & have achieved significant success in developing a proper model which gives real time response with lip syncs.
If anyone is interested in collaboration or to have this solution. Please do let us know.