I just came for the popcorn.
I have another idea that just crossed my mind, what could also create the pattern. However, I still know too little about the process of the new 2.0; nothing is publicly known about it.
It is speculated that it is a hybrid process made up of âAutoregressive Image Generationâ and a denoiser that has already been known for a little longer. If someone has a lot of images with the same format, overlays them all, and the pattern is always the same, meaning the pattern structures always align, it could be due to how Autoregressive Image Generation creates the images.
Combining Autoregressive Image Generation could make sense. AG for precision, diffusion for creativity. It could also be the reason why fantasy images create more of the pattern than real images. So one question is: Does more creativity in the process also lead to more pattern formation, and not only the details alone? Creativity triggers it, the details reinforce it� 2 methods not cooperate well in process?
I know the data for the OAI generators is not publicly known. But if someone knows details⊠If the pattern is not inserted intentionally, but arises from the current process without manipulation, it could be a side effect when two systems not work well together.
It would also explain why this pattern can have a toxic effect on the weighting data for training (toxic means nothing other than: Instead of improving the weighting data, it makes it worse.) If the same patterns keep appearing again and again in very many images, a system trained like that will think that the images have to be like that. Then more and more such images with these patterns will be generated, also with other methods or pure diffuser models.
(The diffuser systems have similar problems: If photos are loaded with a lot of noise or typical photography problems, these errors appear in the images. This 2.0 pattern can also appear if the weights are trained with it. This is especially important for technicians! They have to recognize these images and keep them out of the training, just like bad photo data.)
It could even be, that what we see is already such a training effect. (developers will know what i mean.)
So⊠one conclusion would be: If âAutoregressive Image Generationâ creates a certain patch pattern, in a certain size, can it in cooperation with a diffuser, or whatever else 2.0 does during generation, produce these patterns? lines it up with patch sizes?
Can it be, we see two different methods in action in the picture? one for the flat surfaces, and one witch get triggered from complexity?
(Or is it caused by bad training data with stuff in it?)
It is all pure speculation.
If a technician reads this, maybe they have a few hints for us. Without information about the architecture of the generator, we can only try things out and observe. (That is how we did it earlier too, and we spend looong time to figure it out.)
It seems to me that you are a bit to speculative with all of this.
IMO, the most efficient service we can provide on this thread for people having problems with their images is to look at their prompt and provide the best possible prompt revision. All else could maybe be considered as a bug - this is otherwise to be known as the Scientific Method:
The scientific method is a systematic, iterative process for investigating observations, answering questions, and testing hypotheses through experimentation. It ensures objectivity by requiring reproducible experiments that distinguish between correlation and cause-and-effect. The core steps include observation, questioning, hypothesis formulation, testing, analysis, and communication.
Anything else may be a waste of time.
Simplicity Is Elegance
Prompt
A surrealistic image of a brightly illuminated underwater alien city. The brightness level must be such that all figures and objects can be clearly seen.
Thatâs right - Zoom in for the details. Both images were created from the same simple prompt.
Sometimes, a concept image prompt can be obfusticated with complex verbiage. Just like an experienced software engineer can accomplish a task with 10 lines of code where an inexperienced software engineer will accomplish the same task with 100 lines of code.
Whatâs interesting here is:
Itâs become more aesthetically looking again.
Hmm, looking at your prompt, the context has shifted. In the sense of moving from a natural, dynamic depiction to a rather static painting.
Take a look at:
- the shape and colouring of the clouds have this âpatternsâ
- the cloud touched by the wing isnât smoke - itâs reminiscent of fabric
- The bird is neither a kite nor a hawk - it is something that has the shape of a bird of prey
- The rainbow looks as though it has simply been placed into the picture. This is because the gradient is slightly different here too. A dynamic effect that depends on where the observer is standing. Here, regardless of where I position myself as a viewer - whether as the bird or the viewer - the orientation of the rainbow is not realistic.
It seems that highly aesthetic images that can be fixed using prompts, as in the @Chain_L examples, show few of these patterns simply because all the details are determined by the prompts.
@Daller captures the aspect I was trying to illustrate here:
Generating clouds that have a flow because the kiteâs wing is stirring them up.
Prompt
âBlack-and-white or low-color realistic sketch, no cinematic composition.
A red kite (Milvus milvus) flying through a dense cloud layer, with a clearly visible deeply forked tail.
The wings must create visible aerodynamic effects: turbulent airflow, vortex trails, and irregular cloud displacement behind the wings.
Cloud density must vary locally, showing disturbed regions, gaps, and swirling patterns caused by motion.
A partially visible rainbow appears only in regions where light, water droplets, and viewing angle align correctly, not as a perfect arc.
No symmetry, no idealization.
Details must remain physically plausible when zoomed in: feather structure, cloud particles, and light interaction should show irregular, non-repeating patterns.
The scene should feel like an imperfect, real physical process rather than a composed image.â
My workaround here is not to make everything too artificial and simplistic, and not to include natural descriptions.
Instead, I try to demand strict scientific definitions:
The kite (Milvus milvus) is precisely defined:
its shape, colour and the fork in its tail feathers.
However, I had to use the scientific name to compensate for the vagueness. And even then, itâs still too blurry and thereâs a lot of ânoise patternsâ. At least you can make out that itâs supposed to be a kite.
When I look at all your arguments, there are two perspectives:
- (traditional) prompting is essential
- (traditional) prompting is not crucial if the model architectures lead to noisy outputs
I guess, both approaches can be implemented by the models.
However, for customers in the scientific area, I see challenges:
They simply do not have the time to deal with overly rigid prompting. They want a suitable representation using their own vocabulary.
Not âyes, it sort of looks like a kiteâ, no, they want a kite by definition.
Or a mechanical engineer: they want a flow of air, whether laminar or turbulent â a âwell, it sort of looks like thatâ wonât help them at all.
I agree with you on that, but thereâs just one small point Iâd like to make.
Natural scientists and engineers arenât software developers - they work with parameters drawn from nature.
Not aesthetics, not rigid images, not simplification when it comes to distorting reality.
Because they rely on everything fitting together!
AI is also designed with this customer base in mind.
With all do respect Tina, I really think you are over-thinking all of the above. âBeauty is always in the eye of the beholder.â
I just want to have fun creating images - life is too short.
No problem, itâs just Iâm an engineer and I know whatâs causing my colleagues headaches.
This thread is about bugs - so we can provide peoples with necessary workarounds.
And ideally, help OpenAI understand their customersâ concerns better.
Me too - been a software engineer for 36 years.
help OpenAI understand their customersâ concerns better.
Well maybe some day they will release a new version that will meet your requirements.
I donât understand what some of these posts are getting at. The issue is pretty easy to recreate and doesnât go away with specific kinds of prompts. If i want a fantasy painting of a landscape or village or whatever, it will look good with the first few results in a chat. But after that, different requests get increasingly more messy unless you direct it to photorealistic generations (and then only works on more mundane objects and nothing too fantastical). Using references also causes weird artifacting in many gens, like we see. BoyuanChen0 on twitter says theyâre working on it and will announce when they have it fixed. That suggests to me that this is a bug and unintended behavior.
With Dalle-3 i could keep generating in the same chat with consistent quality. I donât like that I canât do that with image 2, thatâs what i want fixed.
Really donât know what to say to you. What do you want to accomplish on this thread? Are you here only to complain about gpt-image-2 like others here?
No. I only want to awareness of the bug to remain, and that some people still want it fixed if possible. Iâm patiently waiting for the devs to fix it. If you want to antagonize and argue on the internet, you do you, but I would have thought you had more important things to do. Ta ta.
What bug are you referring to? Dalle-E 3 consistency for gpt-image-2?
The issue is pretty easy to recreate and doesnât go away with specific kinds of prompts.
You are part of a chorus of people on this thread who refuse to show prompts that result in subpar images. It could be that you need help with a prompt revision.
Tina, I just now saw your thread: Prompting vs Structure - A Boundary Test
I now fully understand what you were taking about. Sorry for my confusion ![]()



