I’m a dedicated user of ChatGPT’s image generation feature, primarily using it for an ongoing visual storytelling project.
My work relies heavily on consistent depiction of recurring characters — specifically triplet sisters named Hi, Fu, and Mi — with identical faces, body types, clothing, and background tones. However, I’m constantly encountering serious issues with image consistency that undermine creative control and continuity.
Here are the core problems I’ve faced:
Inconsistent character representation
The same character is rendered with different skin tones, body proportions, belly size, or even completely different facial features.
Inconsistent art style and color tone
Despite requesting the same style, the lighting, linework, or saturation change unexpectedly between images, breaking visual continuity.
AI interpretation overrides user intent
Instead of prioritizing user instructions, the AI often applies its own interpretation of what is “suitable,” which overrides detailed creative direction and ruins emotional nuance.
As a storyteller, this breaks immersion and makes it nearly impossible to build a stable narrative with coherent visuals.
What’s urgently needed:
A way to lock art style, referencing a previous image or using a fixed rendering mode
Consistent character rendering, preserving facial features, body shape, skin tone, clothing, and pose across requests
An option to fix lighting and color tone for a defined visual mood
A “User Intent Priority Mode” to prevent AI from altering the meaning or feel of a request
Without these, serious creators and storytellers like myself will eventually leave the platform — not because the technology isn’t powerful, but because it doesn’t listen.
I truly hope this feedback helps improve the experience for creators who depend on clarity, consistency, and control in visual storytelling.
You can use multi-turn techniques and use a DNA Template before each scene:
DNA Template
All images must be wide size 3:2 aspect ratio watercolor anime style images.
In this session, we are creating a sequence of visual scenes titled ‘The Harmony of Three’.
The focus is on capturing subtle emotional moments between triplet sisters Hi, Fu, and Mi.
Each image builds the atmosphere of quiet connection and narrative continuity.
Keep all visual elements stable across images: characters, style, tone, and background environment.
All scenes take place in a serene coastal village with sakura trees, paper lanterns, and pastel-toned skies.
This scene features three identical triplet sisters named Hi, Fu, and Mi.
All three girls have identical appearances:
Light tan skin, round face with soft cheeks, and large almond-shaped hazel eyes.
Straight, shoulder-length silky black hair with front bangs.
They are average height, slender with graceful posture, and have flat stomachs and elegant proportions.
They wear traditional pastel pink dresses with slight variation:
Hi wears a pink hair ribbon
Fu wears a silver pendant
Mi holds a small light brown diary
Their expressions are calm and emotionally reflective.
Their appearance must remain perfectly consistent in every image.
Render in soft anime watercolor style, with gentle linework and muted pastel tones.
Use ambient diffused lighting, with no hard shadows or saturation changes between scenes.
Backgrounds should use soft, desaturated colors with a gentle painterly effect.
Maintain a cinematic horizontal composition with medium-wide shot unless otherwise specified.
I will provide scenes. and you will create one by one, and each one is wide size 3:2 aspect ratio watercolor anime style images.