Hi,
I have been a long-time user of Dall-E3 and was really looking forward to migrating to the new GPT Image API. However, in my subjective experience, the images generated with GPT Image API are really dull and uninspiring compared to the equivalent prompt used with Dall-E3.
To demonstrate this, I have taken a few prompts from the original Dall-E3 release blog.
Tiny potato kings wearing majestic crowns, sitting on thrones, overseeing their vast potato kingdom filled with potato subjects and potato castles.
Dall-E3
GPT Image
A 3D render of a coffee mug placed on a window sill during a stormy day. The storm outside the window is reflected in the coffee, with miniature lightning bolts and turbulent waves seen inside the mug. The room is dimly lit, adding to the dramatic atmosphere.
Dall-E3
GPT Image
Close-up photograph of a hermit crab nestled in wet sand, with sea foam nearby and the details of its shell and texture of the sand accentuated.
Dall-E3
GPT Image
A photo of an ancient shipwreck nestled on the ocean floor. Marine plants have claimed the wooden structure, and fish swim in and out of its hollow spaces. Sunken treasures and old cannons are scattered around, providing a glimpse into the past.
Dall-E3
GPT Image
And an example from my app:
(Styling instructions omitted due to proprietary nature).
Helpful background:
The subject is a Caucasian man aged 40-49, with light skin, long wavy brown hair, brown eyes, and a short beard.
Create a visual representation of the described scene.
A tranquil, softly lit evening setting focused on the subject sitting comfortably in a cozy room designed for relaxation and reflection. He holds a modern vaping device in one hand, with a subtle trail of vapor gently curling upward, symbolizing his casual evening vaping habit. The atmosphere conveys introspection and mindful consideration, with elements like a small meditation cushion, a book about mindfulness, and art supplies scattered nearby to represent alternative relaxation methods such as meditation and hobbies. A faint morning light coming through a window hints at the lingering discomfort and reflection on vaping’s effects. The environment is peaceful yet contemplative, emphasizing the subject’s internal struggle with balancing relaxation, health, and mindful consumption. His expression is thoughtful and pensive, embodying the seriousness of his self-reflection and commitment to making healthier lifestyle choices.
Dall-E3
GPT Image
While GPT Image has superior instruction following, Dall-E3 consistently outperforms it in terms of creativity, artistry, inspiration and general visual story telling. As I am looking for expressive and highly creative visuals, Dall-E3 outperforms GPT Image every single time.
It’s as if GPT Image is much more grounded and biased for realism whereas Dall-E3 is much easier to instruct to create fantastical or dreamy images.
GPT Image takes a lot more explicit instruction to try and create similar to Dall-E3 images, but even then I have not been able to replicate Dall-E3’s artistry and visual expression despite many attempts.
For my use case, what would be ideal is a model that has the instruction following discipline of GPT Image, but creates the creative and expressive images that Dall-E3 does.
For now, I am sticking to Dall-E3 in my application.
Anyone else having similar experience or thoughts?
- Dall-E3 images generated in Vivid mode with HD quality.
- GPT Image images generated in High quality mode.