Hi, I’m working on a similar project but with a slight difference: mine involves generating video from text. The process includes taking a text input, generating audio from it using Text-to-Speech (TTS), and then using that audio along with a 3D lifelike avatar to represent or speak the text. So far, I’ve successfully implemented TTS using Coqui TTS, which is amazing for generating natural-sounding audio. However, I’m having trouble syncing the audio with the avatar’s lip movements naturally and accurately.
Hi @alilaptop393 . I too am trying ways to visualise TTS from OpenAI. Did you have any success with this? Have you found any 3D avatars . mascots or similar you can recommend?
Thank you
I hope you are doing well.
Regarding your question, I haven’t had any success in visualizing OpenAI’s TTS yet. However, I am also exploring different approaches. Could you clarify what kind of 3D avatar, mascot, or visualization method you are looking for? Are you interested in real-time lip-syncing, animated characters, or a specific software/tool for integration?
Looking forward to your response.
Best regards,
Mohsin Ali
Thanks for your reply.
Well I’m looking to use HeyGen. The technology is available and cheap. The docs are there but over the head of a no-coder like me.