Here is a collection of tips and tricks and some weaknesses and limits too, for the GPT-image generator. It took me quite some time to realize relatively simple things. This might help save some time when experimenting.
The first post includes all the findings, and will be updated form time to time.
Here are no tips for API or Python, only prompting for the image generator system it self.
Too dark images: The images in DallE 3 were generally too bright and contained light sources that couldn’t be turned off. That was probably an attempt to solve the “too dark” problem seen in earlier versions. Now the images are often too dark, especially when darkness is part of the motif.
You can try to describe your own light sources in the prompt to better control the mood, but this often doesn’t work well, or you end up with an image that is technically fine but still too dark, and you would like to keep it. Overly dark images can be manually corrected afterward using something like gamma 1.5 in an image editing program. However, this reduces the color space (Banding effect), which can become an issue during further editing.
So it’s better to try adjusting the lighting through the prompt itself.
For a saver correction, (but this is still not the same like a well light image):
Convert 8-bit to 16-bit.
Debanding algorithm (smooth the steps).
All Corrections (for example Gamma 1.5).
Convert back to 8-bit.
Distortions caused by a poor post, processing effect: Currently, a low, quality post, processing generator is being used, because the images, unlike those from DallE 3, are often quite flat and lacking in detail. However, this effect is poor and creates unnatural textures, oversharpening, and additional darkening. Worst of all, it introduces distortions that partially alter the structures of the image. You can recognize it by the final effect that appears to “add details.”
Since this effect is system based, it cannot be disabled.
More consistency: GPT-4o-Image is more consisten than the old DallE 3 system was. And just like the memory of past images, this consistency can be both helpful and limiting. More consistency also limits creativity. Previously, a well-crafted prompt could lead to very different, creative images. With the new system, you often get more or less the same thing repeatedly.
Memory over generations: GPT-4o-Image remembers previously created images, in order to maintain consistency across image generations. This is useful for telling consistent stories, but the system currently does not recognize when a completely new theme is being created, or when each image is meant to be independent. This leads to elements from earlier images being reused, sometimes in ways that are unwanted.
The solution is to open a new browser window every time (this annoyed me so much that gaining insights here will probably take a while).
Photo-Realism: GPT-4o-Image is more photo-realistic. It can now generate realistic-looking people, which DallE 3 never quite managed. This is mostly because the training clearly included many real-world photo datasets.
But where there’s an advantage, there’s also a downside. Compared to DallE 3, GPT-4o-Image is far less creative, especially on its own. It needs more detailed prompting for creativity, and often still doesn’t reach the imaginative quality that 3 had. Depending on the type of image you want to generate, one or the other system might be more useful. More realism comes at the cost of less fantasy.
Fallback from photorealism to painting: When generating more fantastical images, DallE 3 tends to fall back to a style reminiscent of airbrush art, which is often actually quite desirable. (This is probably because many of the images in its training data came from creative sources, and those tend to have that kind of style when they’re really well done.)
GPT-4o-Image, on the other hand, falls back to a painterly style. For me, that’s quite undesirable, but for people who enjoy painted art, it might be just what they want.
However, these images don’t reach the same level of creativity or detail as DallE 3, at least not right away. How much of that can be compensated for through prompting remains to be seen.
Photo weaknesses: Because of all the photo data, the weaknesses of such material have also made their way into the training. The images are grainy, have unnaturally sharpened edges, and show many typical flaws known from digital camera content.
How much of this can be corrected remains to be seen. These image flaws cannot currently be influenced via prompting. It is based on training data.
Comparing Dalle 3 and 4o: At the moment, GPT-4o-Image cannot replace DallE 3 for me. The two systems complement each other, but they are not the same. In fact, it would actually be desirable to have multiple systems with different specializations. Trying to pack all capabilities into a single system would require something that doesn’t currently exist, and maybe doesn’t even need to.
And it’s shows clearly how it is often, what is onside a strength is in the same time on the other side a weakness. You can not have all the strength at the same time, because they often exclude each other.
There’s no reason why we shouldn’t use different systems for different tasks. Personally, I prefer the DallE 3 system because of the types of images I usually generate, and I really hope it won’t just be deleted. Ideally, it would continue to exist as open source if OpenAI decides to retire it.
For me, GPT-4o-Image cannot (yet) replace the DallE 3 system.
Technical:
PNG Format: GPT-4o-Image supports PNG, which allows for lossless compression and transparent backgrounds.
(I’ve done some tests with AVIF, and it currently seems to offer the best compression. Unfortunately, it’s still not widely supported. However, it compresses about 20,30% better in lossless mode, and with almost no visible quality loss, it can achieve compression ratios up to 4x or more. If you’re looking to save space for archiving purposes, this open format is worth considering. Just note that in the software I used for compression, metadata wasn’t preserved.)
Here are two examples of typical digital camera quality in the training data:
This is for users to understand, it is not fixable by prompting, at least not as such, it is dependent on training data. the only way is to try to trigger other data, and hope for the best.
A grainy image that has been enhanced using some poor algorithms.
The new system is not a continuation of DALL·E 3 but a reboot. It produces a completely different style that resembles generators like Flux more than the old DALL·E 3 model.
It also seems that the developers consider the images too flat and have added something like a structure enhancer in the final phase. However, this often worsens the results. The images may become sharper and seemingly more detailed, but at the same time even darker than they already were, and above all, the changes lead to distortions and visual artifacts. After this phase, the images also appear less realistic.
Here are two examples:
in the tiger’s eye you can clearly see the distortion,
as well as in the fennec’s snout.
That pictures are often too dark, can be balanced out by simply describe light and light sources in the image. But the distortion effect can not be switched off in the prompt.
The extra phase looks like a bad photo edit job, very most of the pictures would be better without it. It would be better if such functions would be optional. (They still have no artist in the team.)
Wow, some of the distortions are horrible, specially if the face is small…
I would say 98 99% of the images loos with this effect. They can not add details in this way if the pictures are flat.
I would like to switch this effect off.
An other thing is, Dalle3 was generally too light, now it is too dark, and sometimes it is even difficult to get extra light in the scene, if it competes with darkness in the prompt. And this final effect make all even more darker.
(What is GPT4o-Image? is it a small flux with this extra, or even less? They reduced the parameters? Is it a effect in all multi-modal models to generate flat results? It is now good for ADS but not so for fantasy. Time will show.)
Found a strange effect (in the ChatGPT version).
After 5 to 6 images the generator degenerates. It keeps data over the generation process and this is the result. The effect becomes stronger and stronger by each generation.
(Have not done detailed tests but this was the first quick try. Anybody else?)
The reason for the effect is reusing of denoise stages in the next generation, to reduce calculation costs and maybe to correct made images in process. This causes amplification of patterns over time. But this is a very bad idea. Without trigger words or a option, this should never be a default state. To restart a session each image make no sense for me.
Using the same prompt, you can’t compare DALL-E 3 with ImageGen 1.5 cuz DALL-E 3 is a diffusion model - totally different. Dunno if that explains what you are talking about.
BTW, OAI is retiring both DALLE models in a few weeks
Yes, I know that the two models are not comparable on a technological level. It’s just interesting to see how things develop. I’ve stayed with DALL·E 3 because I didn’t like the results from Image Generator 1 and now also 1.5. And the smaller models that can run on regular hardware are catching up slowly.
I find it interesting to compare the models a little (I’m doing this right now with offline models from Huggy).
What the technicians definitely should disable is the transfer of data from image to image.
If DALL·E 3 is discontinued, I hope they then have a better solution. At the moment, the image generator isn’t usable for me.
I think that will make many customers unhappy…
Hope the new generator is then better.
I’ve been doing some digging, and there’s nothing official yet from OAI about the new image generator , but what I found is that, the new generator is supposed to be better at generating realistic images. But we will have to wait (not too long) and see what the new model is about.
I can only hope they have learned something and will not make the same mistakes again.
I have hardly used the current generator at all. And DALL·E has probably just been shut down, at least I no longer have access through my usual way in ChatGPT. A few days ago I was still creating images with DALL·E 3. (DALL·E was a pure diffuser system, the new one is a hybrid as much i read.)
The worst bug right now is that image data was carried over from one generation to the next. That is a real bug! I will not use the system like this, and probably many other customers will not either.
The images have improved in 1.5. But testing is tooo time-consuming.
Yesterday I prompted couple of images in ChatGPT, while it was generating it behaved a bit differently and the quality of my images were totally different than normally…the style. So this just a theory, but maybe I got a little taste of the new image generator
You was maybe not here at the time when i build my knowledge for DallE-3. So now too late to try, i think…
This page here was planet to be the continuation of this idea. But the new generator never reached me, i never liked it.
I will continue here again, or create a new one with the same idea, but they mus come up with something…
(And clean the bugs.)
Yeah I was not here back then when you started this thread. I’ve used DALL•E and love it, but more and more it has been 1.5.
I have been thinking about one thing, and this is something that I’ve never tested myself, but is it impossible to test using Dall•e images as a reference style for the current image generator, to see if these current generators outputs can hold the Dall•e style? Or am I of my mind thinking this…that’s just something I’ve thought about.