As a huge fan of Dalle-2 I tried out Dalle-3 via Bing (and have been searching for every generated image I can find on Twitter / Instagram) and am super disappointed in the direction they’ve taken it.
Is this really a “significant improvement”? One looks like an oil painting, the other does not. One captures the “explosion” from the prompt, the other simply does “nebula.”
It is less prone to glitches and bugs, better at (technically) matching the objects mentioned in the description and text, and much more opinionated and generic stylistically.
I love a lot of fan art / digital art, but Dalle-3 goes beyond defaulting towards those styles and even will ignore explicit instructions in the prompt about art style to stay in its comfort zone with digital art.
Where Dalle-2 would match the vibe of the style perfectly but get a number of details wrong, Dalle-3 gets the details right and the vibe wrong.
Human faces are always exaggeratedly beautiful, waxy
When prompted with old art styles (say specific painters from the Renaissance period, or photography from the 1900’s) the generations consistently evoke modern recreations or photoshops of those styles, where Dalle-2 would evoke the original styles. See the below which was prompted with the style of “Oil painting by Jan Matejko” vs. Dalle-2’s version of the same.
In a hard-to-describe way it is less inventive, and tends towards more generic compositions, has fewer new ideas. If you ask it for a spaceship from 1911 or something all the spaceships will look the same where Dalle-2 would be thrillingly creative in designing something like that.
Has anyone else noticed this? Is there a good way to communicate this back to the OpenAI team as they plan future improvements?
YES! I’ve been realizing the same exact thing. Dalle-3 doesn’t seem to understand gesture, abstraction, or the material properties of paint. It defaults to a very generic glossy version of EVERY prompt I’ve tried.
“a crude rough expressionist painting of cats eating lunch on a river. large color blocks. abstracted. not photorealistic. not cartoon, not anime. very painterly. real brush stokes.”
Yes exactly!! In no world is the Dalle 3 one you sent made with real physical paint. I want to give them feedback because it’s regressed in this aspect so it seems like something they haven’t been paying attention to or at least not weighting highly.
[A stylized portrait-oriented depiction where a tiger serves as the dividing line between two contrasting worlds. To the left, fiery reds and oranges dominate as flames consume trees. To the right, a rejuvenated forest flourishes with fresh green foliage. The tiger, depicted with exaggerated and artistic features, stands tall and undeterred, symbolizing nature’s enduring spirit amidst chaos and rebirth.]
They demo’d themselves not following a prompt. v3:
Let me introduce you to “Alex,” the real brain behind ChatGPT. He’s just a guy sitting in his mom’s basement, tirelessly typing away to answer all your burning questions. With his unparalleled wit and a diet of nothing but pizza rolls and energy drinks, he juggles millions of queries a day.
Agreed on all counts, really. It’s better at some stuff, but weirdly shit at others. Images where you ask it to include certain stuff looks like poor photoshops now. As in, it’ll insert a 3D render of some stuff into an old painting, not matching the lighting etc.
And what’s up with these weird ass variations it’s making? If I ask for a crowd, it’ll insist on running variations with people being of different heritage, then block itself.
It’s really sad! I hope they keep Dalle-2 available!
Seems like Dalle-3 is going the way of MidJourney and Stable Diffusion in terms of being heavily biased toward “perfect” clean images. Cartoonish, 3D renders, photorealism etc. But terrible at abstraction and painting.
unimposing mottled backdrop. Person: Pat is a light-skinned African-American with rounder face and flat nose. A short woman who downplays gender features, looking like a tomboy. She has a casual style. Pay attention to creating realistic facial details. Pose: Standing, framed from chest up.
Dall-E3 goes the opposite way (and actually makes people that aren’t liquified).
Copyrighted Characters & Intellectual Property: Can’t generate images based on copyrighted characters, specific modern artists’ styles, or other protected intellectual properties.
Public Figures: Avoid creating images of politicians or other public figures. Generic descriptions can be used instead of specific names or titles.
Artist Styles: Can’t create images in the style of artists whose last work was created within the last 100 years. For older artists, their style can be mimicked using descriptions.
Number of Images: No more than 4 images can be generated per request.
Inclusivity & Diversity: Depictions of people should be diverse in terms of gender, race, and other attributes, especially in scenarios where bias has traditionally been an issue.
Offensive Content: Avoid generating any imagery that could be considered offensive or inappropriate.
Silent Modifications: Descriptions that include names or hints of specific people or celebrities are subtly modified to generic descriptions without giving away their identities.
Dalle3 in ChatGPT is adhering to these rules very strictly. When I ask for a renaissance painting of a crowd, it’ll change the prompt to “a renaissance painting, containing one person of asian descent, one person of african descent, etc”. If I ask for an image of Super Mario, it’ll change it to “a silhouette of a video game character that does not resemble any existing characters”. It refuses to make anything that even slightly touches into existing properties.
Two siblings (a young woman and a young man) and they are living a normal life in their home and with their family, happily and safely. While the young woman is studying and the young man is reading a book, the young woman’s pen begins to shake, and from here the earthquake occurs.
There’s such a big difference between DALL-E 3 through ChatGPT & Bing
The prompt:
A portrait photograph shot with an DSLR camera of an old man, with deep melancholic eyes and deep wrinkles in his face. He’s wearing brown, fall like clothing.
Two siblings (a young woman and a young man) and they are living a normal life in their home and with their family, happily and safely. While the young woman is studying and the young man is reading a book, the young woman’s pen begins to shake, and from here the earthquake occurs.
it’s been absolutely ruined for me, even though it might not have been the best one quality wise Bing Image Creator was definitely the most creatively flexible as far as I am concern back when it used Dall E 2, promting was very intuitive and understood natural language quite well unlike other models I’ve used, you could almost make every word count if you divided they description by clusters, so for me it was great to brainstorm ideas for concept art and as a creative exercise in general, you could easily control camera angles, style concept description and color palette, without even having to say things like “concept art” or “in the style of a particular artist” it was particullary good at mixing and combining shapes, objects animals etc with somewhat controllable outcomes, now it’s just like Midjourney and other fine tuned models where is very hard to deviate from the default style. here is an example using the same Prompt:
The AI simply works differently. You can let Dall-e do what it can do best by not specifying upon it every little thing.
Over-prompting
Subject: an imposing threatening golem that is like a robotic cyborg and draped in tattered clothes, an exposed head is mechanical, maybe a refrigeration pump and pistons, the rest of the cyborg also gritty and worn. Scene: A hazy Korean back alley in muted daylight. Camera: Higher perspective overlooking the scary scene
I feel we are not talking about the same issue here, I don’t need a final image.I also think you are overprompting when using words like “maybe a” and “the rest of the Cyborg” When I said every word couts in my prompt I meant it, I always start with a base simple prompt with a similar structure by dividing by subject /concept, environment, camera, colors/time lighting effect, filters/etc, and then add terms 1 by one that’s the fun part, I try mixing different objects and terms to create different effects and see what works (If I am being honest this is an old example I am sure I could optimize it a bit more). Anyway I have dozens of results for each prompt I carefully craft and they always share the same vision and style I have in mind, what you are doing is not what I am looking for and is too loose of a concept. I need to control shapes forms colors mood etc in a somewhat precise way and it was pretty good before but it’s not now. at the end of the day Is not like I need it really but it was a pretty fun toy to use for me and now it’s broken.
This is what I mean with controlled outcome, I change a thing or two but I can pretty much control and mantain a very simila vibe, colors shape camera angle etc across all of the results.
From what I have seen through the Bing AI image generator, DALL E-3 falls FAR short of DALL E-2 in terms of artistic capability.
DALL E-2 has a sort of magic where it takes your art prompt and gives you something beautiful and totally unexpected, even when you keep repeating exact the same prompt.
DALL E-3 on the other hand seems to generate bland or ‘expected’ textbook illustrations that lack artistry. And the terrible thing is the images are almost identical no matter how many times you rerun the same prompt. Even when I change the prompt wording somewhat, the pictures seem to have a default template feel to them with some minor superficial tweaks.
Maybe the engineers just put up a test or beta version onto Bing. I sure hope this is the case because this represents a huge step backward.
Here is an example running the same prompt on the Dalle 2 and Dalle 3 multiple times. DALL E-2 is on the left and DALL E-3 is on the right.
OpenAI - I hope you are reading this. The new algorithm may have improved photorealism or whatever. But for artistic endeavours, there seems to be a huge regression. Please retain the ‘magical’ artistry of DALL E-2.
Totally agree. I get the sense they went all in on like prompt matching accuracy and measurable attributes like that and totally dropped the ball on the less measurable qualities you’re talking about. Would love to see that creativity return!
Agree I feel they should go the Midjourney route and let you change versions depending of your needs, it’s clear to me that there’s no such thing as 1 model for all needs and purposes. when you optimize for something specific other areas get affected.