Web-based image generation: Persistent Scene Elements Across Separate Image Generations Despite Explicit Reset Instructions

Problem Summary:
When generating a sequence of related but distinct cartoon panels within a single chat session, we observed that elements from earlier images (like tables, laptops, chairs) persist in later images — even when those elements are no longer requested and should not exist according to the new prompt.

To fix this, we used an explicit directive at the start of new prompt:
“NEW SCENE — RESET CONTEXT. Start with a blank canvas; ignore everything from previous images.”

However, despite this clear instruction, the system still “remembers” parts of previous scenes and unintentionally includes them.


Steps to Reproduce:

  1. Generate an office scene (with desks, laptops, employees).
  2. Generate a coworking café scene (small tables, laptops, emoji hearts above heads).
  3. Attempt a completely fresh subway scene with the RESET CONTEXT directive — but previous elements like tables, chairs, laptops still appear.

Expected Behavior:
When instructed with “NEW SCENE — RESET CONTEXT,” the model should(???) entirely clear previous imagery and generate a new scene purely based on the current description.

Actual Behavior:
Unwanted props, layouts, and elements from prior scenes continue to leak into new images, even when they conflict with the new prompt.


Impact:
This affects sequential art/storyboard workflows and prevents creating properly isolated scenes within the same project. It also complicates prompt design and reduces generation precision.


Full prompts used:


Prompt 1 (coworking café scene):

NEW SCENE — RESET CONTEXT. Start with a blank canvas; ignore everything from previous images.
A wide cartoon panel inside a trendy coworking cafe.
Foreground: men and women sit at small tables, typing desperately on laptops while exaggerated coffee spills cascade over keyboards.
Midground: chaotic mix of digital dating profiles open on screens, hearts and "likes" floating as emoji ghosts above their heads.
Background: exposed brick walls with tacky inspirational signs like "Love = Upload Faster".
Objects:
- Primary (3): people with coffee-soaked laptops, floating emoji hearts, dating app screens
- Secondary (8): toppled coffee cups, frayed laptop chargers, neon Wi-Fi signs, broken chairs, gig posters for "Emo Tech Nights", potted succulents, crumpled resumes, snoring dog under a table
- Environmental (indoor): warm ambient lighting, light haze from coffee steam
Camera: mid-range shot, flat angle emphasizing chaos on tables.

Prompt 2 (isolated users in same café):

Same coworking cafe setup.
Foreground: same people, now visibly isolated, each trapped inside their own bubble of light from their laptop screens — disconnected from each other.
Midground: emojis faded and ghostly, dating app screens show "No Matches Today".
Background unchanged.
Camera: slight pull-back to highlight the isolation between the figures.
Keep environment, characters, and objects consistent.

Prompt 3 (subway scene — NEW SCENE RESET CONTEXT):

NEW SCENE — RESET CONTEXT. Start with a blank canvas; ignore everything from previous images.
A wide cartoon panel inside a crowded subway car.
Foreground: exaggerated stick-figure humans packed tightly, all holding smartphones, coffees, and backpacks poking others.
Midground: random absurd elements like a man balancing a yoga ball, a woman clutching five smartphones, a couple arguing over selfie poses.
Background: dirty subway walls plastered with absurd ads like "Find Love Faster! Download NOW!"
Objects:
- Primary (3): packed crowd, subway poles overloaded with hands, dangling broken ads
- Secondary (8): spilled coffee puddles, abandoned metro maps, squished sandwiches, dropped phones, scratched-up backpacks, worn-out sneakers, graffiti-tagged walls, twitching subway rats
- Environmental (indoor): harsh fluorescent lights, swaying train motion lines
Camera: cramped mid-shot from within the car, slightly tilted to emphasize discomfort.