Here is a collection of tips and tricks and some weaknesses and limits too, for the GPT-image generator. It took me quite some time to realize relatively simple things. This might help save some time when experimenting.
The first post includes all the findings, and will be updated form time to time.
Here are no tips for API or Python, only prompting for the image generator system it self.
References and links, Click to open it
For API users:
Generate images with GPT Image | OpenAI Cookbook
Check this link too for the old DallE-3 Model, in case you have still access to it:
Collection of Dall-E 3 prompting tips, issues and bugs
Generically:
-
Too dark images: The images in DallE 3 were generally too bright and contained light sources that couldn’t be turned off. That was probably an attempt to solve the “too dark” problem seen in earlier versions. Now the images are often too dark, especially when darkness is part of the motif.
You can try to describe your own light sources in the prompt to better control the mood, but this often doesn’t work well, or you end up with an image that is technically fine but still too dark, and you would like to keep it.
Overly dark images can be manually corrected afterward using something like gamma 1.5 in an image editing program. However, this reduces the color space (Banding effect), which can become an issue during further editing.
So it’s better to try adjusting the lighting through the prompt itself.- For a saver correction, (but this is still not the same like a well light image):
- Convert 8-bit to 16-bit.
- Debanding algorithm (smooth the steps).
- All Corrections (for example Gamma 1.5).
- Convert back to 8-bit.
-
Distortions caused by a poor post, processing effect: Currently, a low, quality post, processing generator is being used, because the images, unlike those from DallE 3, are often quite flat and lacking in detail. However, this effect is poor and creates unnatural textures, oversharpening, and additional darkening. Worst of all, it introduces distortions that partially alter the structures of the image. You can recognize it by the final effect that appears to “add details.”
Since this effect is system based, it cannot be disabled. -
More consistency: GPT-4o-Image is more consisten than the old DallE 3 system was. And just like the memory of past images, this consistency can be both helpful and limiting. More consistency also limits creativity. Previously, a well-crafted prompt could lead to very different, creative images. With the new system, you often get more or less the same thing repeatedly.
-
Memory over generations: GPT-4o-Image remembers previously created images, in order to maintain consistency across image generations. This is useful for telling consistent stories, but the system currently does not recognize when a completely new theme is being created, or when each image is meant to be independent. This leads to elements from earlier images being reused, sometimes in ways that are unwanted.
The solution is to open a new browser window every time (this annoyed me so much that gaining insights here will probably take a while). -
Photo-Realism: GPT-4o-Image is more photo-realistic. It can now generate realistic-looking people, which DallE 3 never quite managed. This is mostly because the training clearly included many real-world photo datasets.
But where there’s an advantage, there’s also a downside. Compared to DallE 3, GPT-4o-Image is far less creative, especially on its own. It needs more detailed prompting for creativity, and often still doesn’t reach the imaginative quality that 3 had. Depending on the type of image you want to generate, one or the other system might be more useful. More realism comes at the cost of less fantasy. -
Fallback from photorealism to painting: When generating more fantastical images, DallE 3 tends to fall back to a style reminiscent of airbrush art, which is often actually quite desirable. (This is probably because many of the images in its training data came from creative sources, and those tend to have that kind of style when they’re really well done.)
GPT-4o-Image, on the other hand, falls back to a painterly style. For me, that’s quite undesirable, but for people who enjoy painted art, it might be just what they want.
However, these images don’t reach the same level of creativity or detail as DallE 3, at least not right away. How much of that can be compensated for through prompting remains to be seen. -
Photo weaknesses: Because of all the photo data, the weaknesses of such material have also made their way into the training. The images are grainy, have unnaturally sharpened edges, and show many typical flaws known from digital camera content.
How much of this can be corrected remains to be seen. These image flaws cannot currently be influenced via prompting. It is based on training data. -
Comparing Dalle 3 and 4o: At the moment, GPT-4o-Image cannot replace DallE 3 for me. The two systems complement each other, but they are not the same. In fact, it would actually be desirable to have multiple systems with different specializations. Trying to pack all capabilities into a single system would require something that doesn’t currently exist, and maybe doesn’t even need to.
And it’s shows clearly how it is often, what is onside a strength is in the same time on the other side a weakness. You can not have all the strength at the same time, because they often exclude each other.
There’s no reason why we shouldn’t use different systems for different tasks. Personally, I prefer the DallE 3 system because of the types of images I usually generate, and I really hope it won’t just be deleted. Ideally, it would continue to exist as open source if OpenAI decides to retire it.
For me, GPT-4o-Image cannot (yet) replace the DallE 3 system.
Technical:
- PNG Format: GPT-4o-Image supports PNG, which allows for lossless compression and transparent backgrounds.
(I’ve done some tests with AVIF, and it currently seems to offer the best compression. Unfortunately, it’s still not widely supported. However, it compresses about 20,30% better in lossless mode, and with almost no visible quality loss, it can achieve compression ratios up to 4x or more. If you’re looking to save space for archiving purposes, this open format is worth considering. Just note that in the software I used for compression, metadata wasn’t preserved.)