Virtual assistant with video and audio

I’m developing a virtual assistant using Flutter. The main idea is that the user enters the app and starts practicing English with other users. I would like to implement a feature where the assistant interacts through both voice and video, using an avatar. I’m looking for advice on which technologies to use for this, especially for synchronizing voice and video in real time with the avatar. Below is an example of what I’m aiming for: Hablo I’d appreciate any suggestions or guidance on how to implement this.
Thanks