Prompting vs Structure - A Boundary Test

I ran a simple test with an image model to see how far prompting can go when we ask for physical realism instead of aesthetics.

Setup (4 steps):

  1. Start with a standard prompt → result: visually pleasing image.

    Aesthetic prompt (baseline)

    “A striking digital photograph captures a red kite gracefully soaring through misty clouds toward a radiant rainbow.
    Its wings stretch wide as it glides above, with sharp, vibrant colors of the rainbow contrasting against the soft, billowing clouds beneath it.”

    -> Goal: beautiful / harmonious

  2. Add physical requirements (airflow, light behavior) → result: still smooth, slightly improved.

    Physical prompt (first correction)

    “Minimal composition: a red kite flying through dense clouds with a visible rainbow.
    The wings must visibly disturb the cloud medium, creating realistic turbulent airflow, vortex trails, and density variations in the mist.
    Cloud particles should react with delay and form irregular swirling structures behind the wings.
    The rainbow must follow physically accurate geometry, dependent on viewing angle and light direction, not a perfect decorative arc.
    Focus on microstructure consistency: fine feather detail, cloud particle variation, and light interaction must remain sharp and plausible even when zoomed in.
    No cinematic perfection, no symmetry, no idealization — only physically consistent interaction between air, light, and matter.”

    -> Goal: Enforce physics

  3. Increase structural demands (irregularity, no symmetry, partial rainbow) → result: looks more complex, but not more accurate.

    Structural prompt (a deeper, systemic approach)

    “Black-and-white or low-color realistic sketch, no cinematic composition.
    A red kite (Milvus milvus) flying through a dense cloud layer, with a clearly visible deeply forked tail.
    The wings must create visible aerodynamic effects: turbulent airflow, vortex trails, and irregular cloud displacement behind the wings.
    Cloud density must vary locally, showing disturbed regions, gaps, and swirling patterns caused by motion.
    A partially visible rainbow appears only in regions where light, water droplets, and viewing angle align correctly, not as a perfect arc.
    No symmetry, no idealization.
    Details must remain physically plausible when zoomed in: feather structure, cloud particles, and light interaction should show irregular, non-repeating patterns.
    The scene should feel like an imperfect, real physical process rather than a composed image.”

    -> Goal: Structure + Imperfection

  4. Add conflicting real-world constraints (turbulence over time, particle behavior, observer-dependent optics) → result: visual chaos, not physical consistency.

    Stress test prompt (out-of-bounds)

    “Realistic, non-cinematic sketch of a red kite (Milvus milvus) flying through a dense cloud layer under physically consistent conditions.
    The tail must be deeply forked and clearly control aerodynamic direction, with airflow visibly reacting to it.
    The wings must generate turbulent vortex trails that persist over distance and evolve, showing time-consistent airflow behavior.
    Cloud particles must respond differently depending on size and inertia: small droplets follow airflow, larger droplets lag behind, creating layered motion inconsistencies.
    Simultaneously, a rainbow must form only at the precise viewing angle relative to the observer, meaning it is only partially visible and disappears in regions where geometry is incorrect.
    The rainbow must distort locally due to uneven droplet distribution and airflow disturbance caused by the bird.
    Lighting must remain physically consistent from a single source, while still allowing visibility of all microstructures.
    No symmetry, no ideal composition, no aesthetic correction.
    The image must prioritize physical causality over visual clarity, even if the result appears messy or incomplete.”

    -> Goal: Overwhelm the system → Reveal its limits

Observation:
The model consistently produces images that look realistic, but it does not seem to consistently maintain physical relationships under more complex constraints.

  • Turbulence appears as texture, not as structured flow

  • The bird resembles a red kite, but lacks precise morphology

  • The rainbow is present, but not truly dependent on viewing geometry

Conclusion:
Prompting can steer appearance, but it does not create underlying structure.

No matter how precise the prompt becomes, the system prioritizes what looks coherent over what is physically consistent.


In short:

It can generate convincing images,
but not fully consistent systems.

This is not a flaw of prompting — it’s a limitation of the model itself.

Curious how others see this:
At what point do prompts stop improving results and start exposing system limits?