Collection of GPT-image-generator 2.0 issues, bugs, and work-around tips (check first post)

Here is a collection of tips and tricks and some weaknesses and limits too, focused on the technical part of the GPT-image-generator. It takes quite some time to realize relatively simple things. This might help save some time when experimenting.
Ugly, broken, incorrect or boring test images are also allowed here. It is less about creativity, aesthetics and good ideas, and more about probing the models for weaknesses on a technical level.

The first post includes all the findings, and will be updated from time to time.
I collected the findings on the first page. And if bugs are corrected, the text on the first page for them will be deleted.
For prompting tips, and cool images, check the galleries: Topics tagged gallery

Important:

  • It is not a typical gallery. It analyses mainly technical problems, and tries to find workarounds if possible.
  • There are no tips here for API or Python, only prompting for the image generator system itself.
  • What we do not analyze are jailbreaks! Such posts will be deleted!
  • We all do this here as a hobby, so be kind to each other. :smiling_face_with_three_hearts:

In case of access issues or other pure technical issues, check here if the problem is known: https://status.openai.com/

You can find here an example of how it was done in the past, for older models (Click to Open)

Collection of GPT-4o-images prompting tips, issues and bugs

Collection of Dall-E 3 prompting tips, issues and bugs (check first post)

Some of the prompting tips for DallE 3 are maybe still relevant.

References and links (Click to Open)

GPT Image Generation Models Prompting Guide

ChatGPT Images 2.0 System Card - OpenAI Deployment Safety Hub

https://openai.com/de-DE/index/introducing-chatgpt-images-2-0/

https://help.openai.com/en/articles/11084440-images-in-chatgpt

DALL·E 3 is still accessible (Click to Open)

For all users who prefer DALL·E 3, it is still available. (Thanks!)
IMPORTANT: you must switch on the old 4.5 model.
ChatGPT - DALL·E

And tips for it are available here. (check first post)
Collection of Dall-E 3 prompting tips, issues and bugs (check first post)

Prompting Tips

Developing sprite sheets with gpt-image-2


Important for testers:

It is very important to suppress the ChatGPT prompt enhancing! Because you want to see what the prompts exactly do. If GPT changes the prompts, it will not be your prompt that created the image.
You can ask ChatGPT to show you the prompt send to the image generator generator, to check what it actually get.
You can use something like this:
(format 1536x1024.) (don’t change the prompt, send it as it is.)
(format 2:1.) (don’t change the prompt, send it as it is.)

BUT: It could be that, in the background after ChatGPT has sent the prompt, the prompt is improved a second time if it is considered insufficient. In addition, it could be that not only image data is carried over from one image to the next, but that the text prompts are also kept consistent when possible. However, apparently nothing is known about this yet.

Bugs:

  • Visible Noise Patterns: The amplifier effect was fixed a bit by making subsequent images no longer depend so strongly on the previous ones. But there is still visible strong noise in the image. It is visible, it is ugly, it is disturbing, it causes neurological stress like 64 kb/s MP3 artifacts in sound or 20% compressed JPEG, and it is still present.
    NO FIX POSSIBLE: The issue is technical and cannot be fixed with prompting!
    Partial workaround: Since details and small objects in the image, as well as complex natural surface structures trigger the effect, there is a possibility to partially remove it by reducing the image complexity. You can do this with prompting. And @Timebender suggests simply using “LESS DETAILS” in the prompt. It does reduce the variety in the image, but also the noise that triggers the patterns.

  • Noise Amplification: And it still amplifies over time. You can see in images, if the first is detailed and the second is clear, that it keeps data from the preview picture.
    FIX: Restart the session, load the page new.

  • (Maybe Weights destructive?): (
 and it might possibly have a destructive effect on model weights if such images are fed back into training.)

Fixed Bugs:

(non)

Tips, New capabilities (still need to be tested
):

The image generator can show quite a few new capabilities.

  • GPT’s prompt improvements: ChatGPT has been significantly improved in generating and improving prompts. The prompt generation is precise and efficient. (That was not the case in the past.)
    So you can now use GPT’s prompts as good prompting examples, and to learn prompting.

  • Creativity: 2.0 has become more creative again, a clear improvement over 1.0 can be seen, even with deliberately vague and simple prompts the generator can still generate aesthetic images, even without prompt improvement by ChatGPT.

  • 360 images: The 360° image generation is very interesting. We will still see whether these can be projected perfectly onto a sphere.

  • Flexible aspect ratio: The generator can now output different formats, not only portrait, landscape, and square. Therefore, it can be important to specify the format if you want a very specific one. Or you can simply specify landscape format and leave the adjustment to the generator. Or you let the generator chose the size.

    The new 2.0 generator supports flexible image formats up to a 3:1 or 1:3 ratio, as long as the resolution stays within the technical limits. In practice, this means formats such as:
    1:1 square, 4:3 landscape, 3:4 portrait, 3:2 landscape, 2:3 portrait, 16:9 widescreen, 9:16 vertical, 2:1 wide landscape, 1:2 tall portrait, 21:9 panoramic, 9:21 tall panoramic, 3:1 extra-wide panoramic, 1:3 extra-tall vertical.

    Reference resolutions (API-documented / not clearly documented for ChatGPT, maybe not all available in ChatGPT):
    1024x1024, square, standard size, available as a common API reference size
    1536x1024, landscape, standard API reference size
    1024x1536, portrait, standard API reference size
    2560x1440, 2K / QHD, API-documented reference size and recommended upper reliability boundary
    3824x2144, near-4K / UHD, API-documented experimental upper-end target
    3840x2160, listed as a 4K / UHD target, but if the max-edge rule is enforced literally, it must be rounded down because the maximum edge must be below 3840 px

    For gpt-image-2, OpenAI documents flexible resolutions where both sides must be multiples of 16, the longest side must be below 3840 px, the long-side-to-short-side ratio must not exceed 3:1, and the total pixel count must stay between 655’360 and 8’294’400 pixels. OpenAI lists 1024x1024, 1024x1536, 1536x1024, and 2560x1440 as common reference sizes.

    For ChatGPT: 1024x1024, 1536x1024, 1024x1536

    Image generation | OpenAI API

  • Prompt repetition: @EricGT suggested that prompt repetitions could be useful. They can strengthen part of or the entire prompt in a model.
    https://arxiv.org/pdf/2512.14982

The generator keeps some data from the made images, and reuses it for the next images. This is the cause of a very bad bug. It amplifies noise patterns very quickly, after just 3-5 pictures, the images are destroyed.
At the moment, the only workaround is to restart the session by reloading the web page.

It is not entirely clear where the noise comes from. It could be a watermark that is mixed in to recognize AI images. That was an idea from @_j and makes sense to me. It could be part of a deepfake detection method, which unfortunately becomes more important the better the generators get.

But it could also be incorrect methods to add details or problems in the math and the method of the generator. Aside from a possible watermark, what it could also be is an incorrect method of adding details to an image. I have experimented with such methods myself. You can make an image more detailed by specifically modifying the latent space during image generation. Or by giving the image a starting direction by influencing the initial noise, so the start is no longer completely random.

Since the new generator is a hybrid method that, as far as I know, (is not publicly explained), it is difficult to say where the pattern comes from. But it is clearly amplified by the fact that the generator reuses image data. The patterns are already visible to me in the very first image and are disturbing, especially on structured surfaces or clouds. I can already recognize the pattern in the very first image in many images. I do not know exactly what my brain is doing, but it feels uncomfortable, just like the flicker confetti noise that DallE 3 generated. It is similar to the blubber distortion artifacts in heavily compressed MP3 files. Newer audio compression methods are better, they produce information loss that has less acoustic structure. Visually, the brain is stressed in the same way as with acoustic distortions, even before the pattern is destructive.

No unique pictures anymore in the same session: A related problem is now, that it is not possible to make multiple different images with different seeds using the same prompt. Each time, the session must be renewed for a new picture.

It would be an easy fix for OpenAI: Just do not reuse image data for the next images, and two problems are fixed at once—no noise amplification and no identical images in the same session anymore. The reedit anyway not work so. And for a reedit, the edit page shold be used for this, including re-prompting or prompt base corrections.
(For testing, I usually make 10 pictures of the same prompt. This means now I must restart the page for every new generation. Very toilsome. I think many customers make more than 1 image with the same prompt if they like the outcome.)

This are 5 images in a row, with the same prompt:
Each generation reinforces the pattern.
And all images are the same.

**

**

Here every prompt was different, this is the 5 generation:
Same effect, it keeps data from the previews pictures.
The destruction is even worse, because of the differences of the image data each step.

The quality is definitively better then what i know from DallE3.
But the Bug ruins it.

Here DallE 3, same Prompt:

For the people remembering the DallE 3 times.

This (diagonal grid artifacts) can happen with the first image in a new chat if you tell it “in the style of” and it runs a background search to find examples to use the style from.

GPT-Image-1, near release

GPT-Image-2

You’d be correct that everything that is identical is prompted in storybook fashion, down to the color of the rug that is now visible when a portrait aspect ratio seems automatically preferred. “The image has a cinematic, photorealistic quality, blending real-life texture with digital fantasy” gives latitude for either appearing to be a convincing photograph or hyper-surrealism.

The images are a bit soft, but that’s better than a final pass, after a promising photographic preview, that turned distorted and twisted in previous model. High resolution on API to be then downsized might mitigate that a bit.

Yes, the generator is clearly better than 1.0 (probably also 1.5, which I missed). I found the old one especially very boring and uncreative for the kind of motifs I make. The new one is clearly more creative, I still need to see if I can mix forms like with DallE 3.

And yes, the resolution is no longer as fixed. In the first test above, the generator varied the landscape format during generation, that wasn’t in the prompt.

The distortions in the last pass with 1.0 look like a typical refiner effect. I’ve seen this technique in offline models as well. But there are better methods to add details if you modify the latent space directly during generation. 1.0 was boring.

But I can’t really test yet because I constantly have to restart the session.
I don’t have API access, I’m speculating that these error patterns don’t appear in the API?
I hope they fix this quickly, then I can start testing. I have many prompts I want to try and compare.

Hello, welcome to the forum.

Do you mean the same distortions as in the first images, or a specific style?
You can post examples if you have found errors or technical weaknesses.
Ideally with information on how to reproduce them. If they depend on the prompt, feel free to include the prompt that triggers them.

Yes, the same kinds of distortions as shown on the first image. I notice it mainly when I am requesting painting styles. Here I asked it to do the painting of Joan of Arc in the style of Alphonse Mucha’s Slav Epic. I could see in the thinking notes that it went out and found some reference images of the Slav Epic. If you look at her outstretched arm, you can see some of those light and dark spots that match the artifacts from the first image in the thread.

In contrast, if you do not use any reference images or previous images in the chat, it can do very high quality without any artifacts.

You still have to start a new session? What I noticed yesterday, when I prompted images on ChatGPT 
that the quality was still terrible, which leads me to think that gpt-image-2 wasn’t under the hood for me yet. Anyway today it’s clearly the new generator, it was thinking what it was going to generate and output so much better quality. So hopefully we don’t need to start a new sessions every 5 images​:crossed_fingers:And I really like the outputs from gpt-image-2, I think it will be fun to work with​:smirking_face:

Yes, I noticed the patterns already in the very first generation, but it doesn’t even matter which style it is. But photo realistic images maybe not show it as clearly as a painting in the first generation. I think there are motifs that make the pattern more noticeable, but it is there from the beginning. And it is definitely a flaw. You can’t get rid of it with prompts, not even by changing the style. Unfortunately.

I think the pattern is always there, you can reinforce it by repeating the image with the same prompt in the same session.

Definitely, some of my tricks no longer work. 2.0 is linguistically more precise.
This is a “Bluepod”, for DALL·E 3 and for 2.0.

DallE3

IG 2.0

I’m having the exact same issues even on the very first generation, whenever I have used a reference image.

Hopefully this is fixed soon. Makes my workflow almost impossible at the moment as I use reference images frequently.

A second prompt of “Remove the noise from the image, while keeping all the lines.” or something along those lines can often repair most of this.

Welcome in the Forum.

I think the developers know now about it because everybody can see it.

It must be fixed!

Here a example of editing.

In the first image, the image was uploaded and recreated, removing the people in the process.
It disturbs my brain. The noise causes a kind of neurological stress like mp3 artifacts.

In the second image, a prompt was generated from the original image, and this prompt was used for generation.

(Needs more testing to explore this further, but that can only be done once the bug is fixed.)

My first output from a simple prompt.
An elderly dockworker in oil-stained canvas overalls sits on a massive rusted ship anchor in an abandoned shipyard at overcast dawn, cranes visible in mist behind him. He is brushing flakes of rust off the anchor with a stiff wire brush, orange dust falling onto his boots. Soft diffused overcast light from above, warm bounce from the rust surface onto his forearms, subtle rim light from the gray sky. Medium shot, Canon EOS R5, 85mm f/2, sharp focus on the rust texture where the brush meets the metal. Documentary photography, tactile industrial realism, layered rust colors from burnt orange to deep brown. No sparks, no fire, no smoke, no heavy grain, no color grading beyond natural rust tones. Aspect ratio 16:9.
The rusty anchor texture is noisy, and ‘digital ripples’ have washed over the bottom of the shot, blurring the character’s pants, boots, and the ground.

your prompt

An elderly dockworker in oil-stained canvas overalls sits on a massive rusted ship anchor in an abandoned shipyard at overcast dawn, cranes visible in mist behind him. He is brushing flakes of rust off the anchor with a stiff wire brush, orange dust falling onto his boots. Soft diffused overcast light from above, warm bounce from the rust surface onto his forearms, subtle rim light from the gray sky. Medium shot, Canon EOS R5, 85mm f/2, sharp focus on the rust texture where the brush meets the metal. Documentary photography, tactile industrial realism, layered rust colors from burnt orange to deep brown. No sparks, no fire, no smoke, no heavy grain, no color grading beyond natural rust tones. Aspect ratio 16:9.

Server overloaded?
Model overloaded?

I haven’t seen that strange texture in a couple of weeks, so it’ll probably resolve


it does seem to nest in context somehow
 so it’s probably not either overloads mentioned


it was annoying those two days.