GPT Image is so dull and uninspiring vs Dall-E3

Hi,

I have been a long-time user of Dall-E3 and was really looking forward to migrating to the new GPT Image API. However, in my subjective experience, the images generated with GPT Image API are really dull and uninspiring compared to the equivalent prompt used with Dall-E3.

To demonstrate this, I have taken a few prompts from the original Dall-E3 release blog.


Tiny potato kings wearing majestic crowns, sitting on thrones, overseeing their vast potato kingdom filled with potato subjects and potato castles.

Dall-E3

GPT Image


A 3D render of a coffee mug placed on a window sill during a stormy day. The storm outside the window is reflected in the coffee, with miniature lightning bolts and turbulent waves seen inside the mug. The room is dimly lit, adding to the dramatic atmosphere.

Dall-E3

GPT Image


Close-up photograph of a hermit crab nestled in wet sand, with sea foam nearby and the details of its shell and texture of the sand accentuated.

Dall-E3

GPT Image


A photo of an ancient shipwreck nestled on the ocean floor. Marine plants have claimed the wooden structure, and fish swim in and out of its hollow spaces. Sunken treasures and old cannons are scattered around, providing a glimpse into the past.

Dall-E3

GPT Image


And an example from my app:

(Styling instructions omitted due to proprietary nature).

Helpful background:
The subject is a Caucasian man aged 40-49, with light skin, long wavy brown hair, brown eyes, and a short beard.

Create a visual representation of the described scene.

A tranquil, softly lit evening setting focused on the subject sitting comfortably in a cozy room designed for relaxation and reflection. He holds a modern vaping device in one hand, with a subtle trail of vapor gently curling upward, symbolizing his casual evening vaping habit. The atmosphere conveys introspection and mindful consideration, with elements like a small meditation cushion, a book about mindfulness, and art supplies scattered nearby to represent alternative relaxation methods such as meditation and hobbies. A faint morning light coming through a window hints at the lingering discomfort and reflection on vaping’s effects. The environment is peaceful yet contemplative, emphasizing the subject’s internal struggle with balancing relaxation, health, and mindful consumption. His expression is thoughtful and pensive, embodying the seriousness of his self-reflection and commitment to making healthier lifestyle choices.

Dall-E3

GPT Image


While GPT Image has superior instruction following, Dall-E3 consistently outperforms it in terms of creativity, artistry, inspiration and general visual story telling. As I am looking for expressive and highly creative visuals, Dall-E3 outperforms GPT Image every single time.

It’s as if GPT Image is much more grounded and biased for realism whereas Dall-E3 is much easier to instruct to create fantastical or dreamy images.

GPT Image takes a lot more explicit instruction to try and create similar to Dall-E3 images, but even then I have not been able to replicate Dall-E3’s artistry and visual expression despite many attempts.

For my use case, what would be ideal is a model that has the instruction following discipline of GPT Image, but creates the creative and expressive images that Dall-E3 does.

For now, I am sticking to Dall-E3 in my application.

Anyone else having similar experience or thoughts?


  • Dall-E3 images generated in Vivid mode with HD quality.
  • GPT Image images generated in High quality mode.
5 Likes

My first impressions were exactly like yours. However, after playing around with gpt-image-1, I came to realize that it is much more versatile than DALL-E-3. You can do stuff with gpt-image-1 that is impossible with DALL-E-3 - and vise-versa to some extent… Ideally, it would be nice if OpenAI could somehow combine the models, standardize the parameters.

Here are some examples I made:

Made with gpt-image-1. Looks like an old Disney (Walter Lantz) cartoon scene:

Here, I used two DALL-E-3 images and merged them with gpt-image-1.

“The Transformation of ManBearPig”

If OpenAI cannot combine the models, they must keep DALL-E-3.

The prompts OpenAI used were very “random happenstance” and open to interpretation. Photo or cartoon, art or no: not specified. Did potato kings get their thrones in the blog? No. Were plural “kings” delivered there? No.

DALL-E 3 is good at filling in the gaps or going in a random direction, but ChatGPT now needs to know what it is you want. With DALL-E, you could send a collection of maths expressions and get a wild fantasy image; with ChatGPT’s gpt-4o image maker, you might get the formula put in a plain picture.

I have ChatGPT itself write several paragraphs of describing all that needs to be depicted before it invokes the image creation tool. You can tell it to hold that back until you like its own “creativity” add.

Here’s two more sentences added to “potato kingdom”, emulating the style of what the blog showed. And with everything described in the prompt there now, with just open interpretation of “overseeing” a potato kingdom.

So this forum topic can be taken more as advice: neither AI image maker will read your mind to give you what you actually want, nor are they perfect at delivering.

Perhaps the prompting will have to be changed? I liked the result I got.

Thanks, can you share the updated prompt that got you the better result with GPT Image?

I broadly agree with the points you are making, but I have not find a reliable way to emulate Dall-E3’s “Vivid” style in GPT Image yet. Any tips are welcome!

1 Like

Was this the same prompt as before or did you tweak it for GPT Image?

1 Like

The image’s hover text here is the first input to the ChatGPT session.

Custom instructions compels the AI to write multiple paragraphs about the ideal depiction embracing and fulfilling the idea, before then automatically sending to the image_gen recipient.

Does anybody know where I can try creating Dall-E3 images? when I search for Dall-E3 the page says ‘Try in ChatGPT’. If that’s the case how do I differentiate between Dall-E3 images and ChatGPT images? Thx.