Collection of Dall-E 3 prompting tips, issues and bugs

Hi @Daller

Yes, of course!

I tried four times describing it on my custom GPT. But not “over and over and over again”.

What it passed at the end, 4th try, I used its prompt changing from LAKE to SEA.
I posted it in my previous post above.

Following steps are how I got its prompt.


And now, I just modified a little, used word “CLOSE-UP”, I turned it to LAKE again, and surface is OILY, however, although I said UNSMOKED, it displayed smoke.


I changed the phrase in first sentence from ‘resembling an unsmoked’ to ‘resembling an no burning effect’, it worked better.

2 Likes

Thanks a lot!

Yes negations don’t work. I try to avoid using them, but sometimes it’s very difficult because the only positive descriptions don’t always work well.

The original prompts actually tried to create a flame made out of water, “burning” in the water. But so far, it hasn’t worked. But the Flame in the darkness is a nice result.

Sometimes what helps is saying ‘everything surrounded by pitch-black fog,’ but DALL-E is very stubborn. I don’t have experience with other image generators yet, but I can imagine that this is a DALL-E-specific problem. (At some point, I want to try Midjourney and Stable Diffusion.) I’ve generated many images now and am still learning. To fix the issues I’ve collected in this post, I built myself a GPT, which has definitely helped. I don’t write the texts in English, and GPT often corrected them incorrectly before and made them worse.

But the template effect is clear. Unfortunately, we don’t have any information from the developers, but I think it’s either overtraining or correction instructions or data reduction that cause this template effect in some situations. It feels like a preset.

3 Likes

Something positive: DALL-E is really good at generating beautiful landscapes. I feel a bit limited in possibilities at the moment, so I often focus on what DALL-E does well.

Anyone who has ever tried to render such an image in a 3D program knows how extremely difficult it is. Without very deep knowledge and a lot of working time, it is almost impossible, and the render time for such images in ray tracing programs is enormous.


4 Likes

Here is a example of face-blur effect. If faces are small in a scene, they blur out and show distortions. to ask form more detail accuracy will have no effect, because it is the way a stable diffusion system works.
It would need a special treatment of important parts of a image, without to use higher resolution for the entire image. This is not part of DallE 3 skills now.

Image is scaled 400%
face blur

Here what can happen if you mention “a scene” in a prompt. It sometimes literally creates a scene on a stage or here in a movie.
Same can happen if you mention “create”, sometimes drawing tools appear in a image. Or same with “close to the camera”.

And using “Alien” not leads in a creative non-human creature, but mostly in a Giger or Roswell stereotype.

Keep in mind that EVERY word can literally appear in a image, and every word can affect the entire scene through a scattering effect.

2 Likes

Wow… what a collection of tips. I wish I had of found these a couple of weeks ago. I was really struggling to get Dall-E to generate useful images so I decided to give Stable Diffusion a try last week and I’m not sure I can go back now. They have so many nice features and an experience that’s optimized for image generation. I don’t see how Dall-E can compete without a larger investment from OpenAI across the board.

2 Likes

I noticed if you use “draw” in prompt it sometimes makes an active drawing ie paper on a table drawing the picture by hand with pencil :pencil2:

1 Like

Yes any “create” advice is anyway useless, DallE will create the picture. So you can left this out of the prompt. Use the DallE input mask in the browser, so there is no confusion with the text writer GPT.

1 Like

Here is a idea at what point the faces get blurry.
DallE refuses often to show a character in full body size. could be that this is a reason, because the bigger the face, the more clear.

1 Like

If I get this Christmas card I am not going to be happy!

I could list the issues but will leave it at Santa’s sleigh needs a fire extinguisher and those poor reindeer…

‘’‘Prompt
A photorealistic Christmas-themed illustration in A5 size. Snowflakes falling, crimson ribbons, mistletoe, angels watching from golden skies, choirs singing, guiding stars, frosted branches bowing with grace, children dreaming with Father Christmas, sleigh bells ringing, and a glowing winter night. The scene evokes warmth, peace, love, and the magic of the holiday season, presented in a vertical format suitable for A5 paper, with high detail and photorealistic elements.
‘’’

2 Likes

Thanks for sharing problems (and tips) everyone.

Including prompts is helpful to see if we can replicate the problems and fix them, etc.

For now i hope to find some tricks to so-round them temporally. But of course, the technicians should work on it to fix them.

(Oh pleas, the mouthy first, and non distorted faces, this hits us humans psychologically… :pleading_face:)

(Where the nonsens text bug come from? It looks like a engine test not deactivated. Where DallE gets the idea from to put prompt fragments in the pictures, if it not know how to visualize them…?)

Here is a test with prompt complexity, intentionally with a prompt a bit on the limit of “graphic tokens”, more and the picture get strange, things missing etc. One long and detailed, the other reduced to minimum. Principally the same result.

So i speculate that DallE kills out everything witch can not be understand graphically. maybe the reason why it is difficult to place objects precisely (and maybe for the scatter effect a bit too.)

Maybe we can stop writing poems, catchwords are enough.(?)
There is more testing needed…

Prompt long
{
  "prompt": "A creature from an unknown species with an extraordinary, almost supernatural appearance bathes in a crystal-clear lake, surrounded by a mystical night landscape. The being has large, ethereal wings that gently glow in the water, and translucent skin with a shimmering yellowish aura that illuminates the dark surroundings. Its eyes shine with a subtle golden glow, while its wings reflect the moonlight, causing the water around the being to glow. The lake is fed by a multi-tiered cascading waterfall, where crystal-clear water tumbles over shimmering rocks, creating a fine, colorful mist that refracts the light into soft rainbow hues. The trees around the lake have deep purple leaves, with branches arching over the water and waterfall, reflecting the being's shimmering light. Some trees even extend beyond the waterfall, and their foliage glows lightly in the darkness. In the foreground, orange plants and yellow flowers enhance the magical atmosphere. The sky is dark blue, streaked with light clouds that gently reflect the being's light. The contrast between the purple trees and the yellowish glow of the being creates a mysterious yet peaceful atmosphere. The scene feels otherworldly, filled with mystical and ethereal elements. Photo style.",
  "size": "1792x1024",
  "n": 1
}

Prompt short
{
  "prompt": "Unknown species, supernatural. Large, ethereal wings glowing in the water. Translucent skin with a yellowish aura. Golden eyes reflecting moonlight. Glowing water. Multi-tiered waterfall with shimmering rocks, colorful mist. Violet trees with overhanging branches reflecting light. Orange plants, yellow flowers. Dark blue sky with light clouds. Violet-yellow contrast. Photo style.",
  "size": "1792x1024",
  "n": 1
}

… A other interesting question is, how many graphic elements we can use in DallE.
There is a point where the picture gets lower quality, and a point where elements get missing and are ignored.

(Sadly i am not aware of any documentations from the devs.)

The sent prompt is truncated at 256 tokens by the model. Your “long” prompt is 244, in cl100k_base.

If you just say “apple”, you’ll probably get an apple.

How does it look with 166 tokens? (each depictable noun is a single token)

“prompt”: “Each item is in the image: apple bag ball basket battery bed belt bench bicycle binders blanket blinds board book boot bottle bowl box brush bulletin bus button cabinet cable calculator calendar camera candle car carpet case cat chair chalk charger clips clock closet coat coffee coin comb computer controller couch cup curtain desk dog door drawer dress dresser drill drum envelope eraser fan flute folder fork frame glass glasses globe glove guitar hammer hanger hat headphones jacket jeans key keyboard knife lamp laptop light magazine map marker mattress microphone microwave mirror mop mouse nail newspaper notebook oven pants paper pen pencil phone piano picture pillow pizza plant plate plug poster radio razor refrigerator remote ring rug ruler sandals sandwich saw scarf scissors screw shampoo shirt shoe shoes shorts sink skirt soap sofa speaker sponge spoon stool stove suit suitcase sweater table tablet tape television tie toilet towel tray tree tripod trophy umbrella vacuum vase violin wall wallet watch water window wrench”

A few apples. A few apple-inspired oddities. A pineapple. And a game of find the errors.

By the time you get to “window” - a computer with Windows?

I had to verify gpt-4o sent the list without screwup. Like it did when sending to python for the sort. And a deduplication from 260 tokens, showing attention in producing the words.

3 Likes

Oh great, thanks much for the info!
“256 tokens”, this make sense.

What i am still not fully get is how DallE interprets the language and connect it with the graphic weights. I speculate now, we can left out poetic and too descriptive language, specially story telling. Because there is no training data connecting this info with a image in training (if there is 1 2 images where it fits, this is too little data in the weights to have a effect). I think we should use very simple language like “Tree with red laves” or “Tree with red laves on top of mountain”. And sum up all the atmospheric infos, in 1 sentence like, “romantic magic mood” or “gloomy night mood”.

We still speak with the AI like with a human, writing a poem for it, but what it needs is graphic tokens.

And we should not use all the 256 tokens for graphic elements, because of the scattering effect. It is like mixing colors, if you mix ALL colors together, you get shit-brown. The more the system can place information’s correctly, the more graphic tokens can be used to describe a precise setting. But location and placement is still a issue for DallE.

I get sometimes the most beautiful images with very simple prompts.

If you have a link to the developers, pleas let them know to let as use seeds! if there is a concern that pictures will be different as soon the weights are updated, just inform the users that there is no guaranty to get the same image, or give access to all weights versions. To be able to use seeds would make it so much easier to see the effects of words in a prompt, and this would speed up the learning enormously.

2 Likes

A lot of us have been asking for this for a while. I’ve noticed slight improvements, and I wonder if we’re on a DALLE3.5 model or something. There’s been no official word, so it could just be me “seeing things”…

OpenAI does have a bit of room to catch up with the others… Being better with negative prompts is another thing that needs work… Overall, getting much better slowly!

2 Likes

:pleading_face: Seed and negative prompts… Pleas… Yes…

I will test MJ soon and SD, i have even checked the self standing open SD setups. My machine is just not strong enough. You need some strong GPUs.
The days simply have not enough hours…

Like you know, i think they had a weights infection recently, and after they had updated it again i think, because the system was down for some hours.
…“seeing things”, yes me too, i have redone some of my favorites, but i could not see much change, if it is there, it is very subtly. And could be the update maybe affects some styles more then others. Maybe not 3.5 but 3.05

Sad that devs not communicate with users.

2 Likes

From what I know, the core DALLE team is really small, but they do a lot. They may have grown since I had more contact with some of them, but they seem to be “heads-down” and busy, I hope!

1 Like

I think there is some updates on DALL-E

Now DALL-E is checking its results if we tell its mistake.

3 Likes