I would like to share an observation that emerged from comparing multiple image generations with very similar prompts, but different degrees of physical realism.
You can see the relevant image posts under the links listed in the current forum gallery:
- The Official ImageGen, 4o and Dall-E Megathread - #1593 by Tina_ChiKa
- The Official ImageGen, 4o and Dall-E Megathread - #1594 by LarisaHaster
- The Official ImageGen, 4o and Dall-E Megathread - #1617 by Tina_ChiKa
- The Official ImageGen, 4o and Dall-E Megathread - #1622 by Tina_ChiKa
- https://community.openai.com/t/the-official-imagegen-4o-and-dall-e-megathread/1230134/1596
- The Official ImageGen, 4o and Dall-E Megathread - #1565 by polepole
- https://community.openai.com/t/the-official-imagegen-4o-and-dall-e-megathread/1230134/1567?u=tina_chika
Thank you to all the contributors I have mentioned here – you are doing wonderful work and laying foundations on which knowledge can grow ![]()
Observation (empirical, not anecdotal)
Two visually similar images were posted side by side:
-
Image 1
– closer to real-world physics
– asymmetric fragmentation
– transitional state (process visible, not a clean end state)
– visually “messier” -
Image 2
– more cinematic / iconic
– high symmetry
– instantaneous-looking explosion
– clear silhouette and contrast
Despite Image 1 being physically more plausible, Image 2 consistently received significantly more positive reactions (likes / attention).
This pattern repeats across similar examples (glass breakage, impact dynamics, fluid–solid interaction).
Why this matters in ML terms
From a classical bias–variance trade-off perspective (especially in regression-like settings):
-
We already accept a moderate increase in bias
-
In order to achieve a strong reduction in variance
-
With the goal that unrealistic noise is suppressed,
while informative structure remains
This is standard and expected.
However, when RLHF is added, something subtly but fundamentally different happens.
What RLHF changes (key point)
RLHF does not regularize variance with respect to ground truth or physical consistency.
Instead, it regularizes variance with respect to human reward signals, which tend to correlate with:
-
immediate visual clarity
-
symmetry
-
iconic end states
-
cinematic contrast
-
fast recognizability
Formally, the optimization objective becomes something like:
Where ( R_{\text{human}} ) is not aligned with physical correctness or process realism.
The resulting failure mode
The critical issue is which variance gets reduced.
In theory:
-
Variance ≈ random, unrealistic deviations
-
Signal ≈ rare but correct physical edge cases
In practice with RLHF:
-
Variance ≈ anything that deviates from the reward-optimal aesthetic
-
This includes:
-
asymmetric transitions
-
intermediate process states
-
physically correct but visually “uncomfortable” frames
-
As a result:
-
Physically realistic variation is suppressed
-
Cinematic but incorrect patterns are reinforced
-
The model converges toward an iconic mean, not a physical one
This is not overfitting and not underfitting.
It is a reward-induced bias shift, orthogonal to the classical bias–variance axis.
Why the Screenshots are relevant
The like-count difference is not just social noise — it is a proxy for the same signal RLHF optimizes.
In other words:
-
The same human preferences that drive likes
-
Are implicitly shaping the reward surface during RLHF
This makes RLHF especially effective at:
-
eliminating “rough” but correct outputs
-
preserving “clean” but incorrect ones
Core takeaway
RLHF reduces variance along the perceptual reward dimension,
not along the ground-truth or physical-consistency dimension.
As a consequence:
-
Unrealistic cinematic effects remain
-
Physically correct edge cases are smoothed out
-
Models increasingly optimize for plausibility rather than process correctness
Why this is important
This effect is subtle, cumulative, and easy to miss —
but it directly impacts domains where process realism matters:
-
physics-informed generation
-
scientific visualization
-
biomechanics
-
material interaction
-
high-speed dynamics
In short:
Likes are not ground truth — but RLHF implicitly treats them as such.
Thanks for reading ![]()
I’m curious whether others have observed similar reward-induced bias shifts in multimodal models.
