Check this link too:
Generically:
-
More consistency: GPT-4o-Image is more consisten than the old DallE 3 system was. And just like the memory of past images, this consistency can be both helpful and limiting. More consistency also limits creativity. Previously, a well-crafted prompt could lead to very different, creative images. With the new system, you often get more or less the same thing repeatedly.
-
Memory over generations: GPT-4o-Image remembers previously created images, in order to maintain consistency across image generations. This is useful for telling consistent stories, but the system currently does not recognize when a completely new theme is being created, or when each image is meant to be independent. This leads to elements from earlier images being reused, sometimes in ways that are unwanted.
The solution is to open a new browser window every time (this annoyed me so much that gaining insights here will probably take a while). -
Photo-Realism: GPT-4o-Image is more photo-realistic. It can now generate realistic-looking people, which DallE 3 never quite managed. This is mostly because the training clearly included many real-world photo datasets.
But where there’s an advantage, there’s also a downside. Compared to DallE 3, GPT-4o-Image is far less creative, especially on its own. It needs more detailed prompting for creativity, and often still doesn’t reach the imaginative quality that 3 had. Depending on the type of image you want to generate, one or the other system might be more useful. More realism comes at the cost of less fantasy. -
Fallback from photorealism to painting: When generating more fantastical images, DallE 3 tends to fall back to a style reminiscent of airbrush art, which is often actually quite desirable. (This is probably because many of the images in its training data came from creative sources, and those tend to have that kind of style when they’re really well done.)
GPT-4o-Image, on the other hand, falls back to a painterly style. For me, that’s quite undesirable, but for people who enjoy painted art, it might be just what they want.
However, these images don’t reach the same level of creativity or detail as DallE 3, at least not right away. How much of that can be compensated for through prompting remains to be seen. -
Photo weaknesses: Because of all the photo data, the weaknesses of such material have also made their way into the training. The images are grainy, have unnaturally sharpened edges, and show many typical flaws known from digital camera content.
How much of this can be corrected remains to be seen. These image flaws cannot currently be influenced via prompting. It is based on training data. -
Comparing Dalle 3 and 4o: At the moment, GPT-4o-Image cannot replace DallE 3 for me. The two systems complement each other, but they are not the same. In fact, it would actually be desirable to have multiple systems with different specializations. Trying to pack all capabilities into a single system would require something that doesn’t currently exist, and maybe doesn’t even need to.
And it’s shows clearly how it is often, what is onside a strength is in the same time on the other side a weakness. You can not have all the strength at the same time, because they often exclude each other.
There’s no reason why we shouldn’t use different systems for different tasks. Personally, I prefer the DallE 3 system because of the types of images I usually generate, and I really hope it won’t just be deleted. Ideally, it would continue to exist as open source if OpenAI decides to retire it.
For me, GPT-4o-Image cannot (yet) replace the DallE 3 system.
Technical:
- PNG Format: GPT-4o-Image supports PNG, which allows for lossless compression and transparent backgrounds.
(I’ve done some tests with AVIF, and it currently seems to offer the best compression. Unfortunately, it’s still not widely supported. However, it compresses about 20,30% better in lossless mode, and with almost no visible quality loss, it can achieve compression ratios up to 4x or more. If you’re looking to save space for archiving purposes, this open format is worth considering. Just note that in the software I used for compression, metadata wasn’t preserved.)