Hi everyone ![]()
I’ve been exploring avatar and video generation tools like HeyGen, D-ID, etc. They’re great for single-character monologues, but there’s a big gap:
What if I want to upload a single image with multiple characters, label them (say Character A and Character B), give each a script line, and have the AI generate a video where they talk back-and-forth naturally in the same scene?
Right now the only workaround is:
-
Generate Character A’s video separately.
-
Generate Character B’s video separately.
-
Stitch them manually in an editor.
This works, but it’s clunky and breaks immersion.
Why it matters
-
Storytelling / skits (short, funny dialogues for YouTube/TikTok).
-
Educational explainers (role-plays, Q&A format).
-
Corporate training (simulated conversations).
Community context
I’ve seen related discussions here in the forum:
-
People asking how to keep consistent characters across DALL·E generations.
-
People asking about multi-character prompting (DALL·E often merges features).
-
People asking about multi-image outputs in one call.
All of these point to the same need: better identity handling + multiple characters in one scene.
The idea (simple version)
-
Upload one image → AI detects multiple faces.
-
Label faces as Character A / Character B.
-
Provide a script (dialogue lines per character).
-
AI generates a video where A and B speak in turn, with lip-sync + natural pauses.
It feels like a natural extension of existing avatar tools — just applied to multi-character conversations.
Question
Is OpenAI (or anyone here) exploring multi-character video with dialogue from a single image?
Would love to hear if this is on the research roadmap or if there are known technical blockers.
Thanks ![]()
Note: This idea is mine, but I used ChatGPT to help refine the wording and structure so it’s easier to follow.