And now, I just modified a little, used word “CLOSE-UP”, I turned it to LAKE again, and surface is OILY, however, although I said UNSMOKED, it displayed smoke.
Yes negations don’t work. I try to avoid using them, but sometimes it’s very difficult because the only positive descriptions don’t always work well.
The original prompts actually tried to create a flame made out of water, “burning” in the water. But so far, it hasn’t worked. But the Flame in the darkness is a nice result.
Sometimes what helps is saying ‘everything surrounded by pitch-black fog,’ but DALL-E is very stubborn. I don’t have experience with other image generators yet, but I can imagine that this is a DALL-E-specific problem. (At some point, I want to try Midjourney and Stable Diffusion.) I’ve generated many images now and am still learning. To fix the issues I’ve collected in this post, I built myself a GPT, which has definitely helped. I don’t write the texts in English, and GPT often corrected them incorrectly before and made them worse.
But the template effect is clear. Unfortunately, we don’t have any information from the developers, but I think it’s either overtraining or correction instructions or data reduction that cause this template effect in some situations. It feels like a preset.
Something positive: DALL-E is really good at generating beautiful landscapes. I feel a bit limited in possibilities at the moment, so I often focus on what DALL-E does well.
Anyone who has ever tried to render such an image in a 3D program knows how extremely difficult it is. Without very deep knowledge and a lot of working time, it is almost impossible, and the render time for such images in ray tracing programs is enormous.
Here is a example of face-blur effect. If faces are small in a scene, they blur out and show distortions. to ask form more detail accuracy will have no effect, because it is the way a stable diffusion system works.
It would need a special treatment of important parts of a image, without to use higher resolution for the entire image. This is not part of DallE 3 skills now.
Here what can happen if you mention “a scene” in a prompt. It sometimes literally creates a scene on a stage or here in a movie.
Same can happen if you mention “create”, sometimes drawing tools appear in a image. Or same with “close to the camera”.
And using “Alien” not leads in a creative non-human creature, but mostly in a Giger or Roswell stereotype.
Keep in mind that EVERY word can literally appear in a image, and every word can affect the entire scene through a scattering effect.
Wow… what a collection of tips. I wish I had of found these a couple of weeks ago. I was really struggling to get Dall-E to generate useful images so I decided to give Stable Diffusion a try last week and I’m not sure I can go back now. They have so many nice features and an experience that’s optimized for image generation. I don’t see how Dall-E can compete without a larger investment from OpenAI across the board.
Yes any “create” advice is anyway useless, DallE will create the picture. So you can left this out of the prompt. Use the DallE input mask in the browser, so there is no confusion with the text writer GPT.
Here is a idea at what point the faces get blurry.
DallE refuses often to show a character in full body size. could be that this is a reason, because the bigger the face, the more clear.
If I get this Christmas card I am not going to be happy!
I could list the issues but will leave it at Santa’s sleigh needs a fire extinguisher and those poor reindeer…
‘’‘Prompt
A photorealistic Christmas-themed illustration in A5 size. Snowflakes falling, crimson ribbons, mistletoe, angels watching from golden skies, choirs singing, guiding stars, frosted branches bowing with grace, children dreaming with Father Christmas, sleigh bells ringing, and a glowing winter night. The scene evokes warmth, peace, love, and the magic of the holiday season, presented in a vertical format suitable for A5 paper, with high detail and photorealistic elements.
‘’’
For now i hope to find some tricks to so-round them temporally. But of course, the technicians should work on it to fix them.
(Oh pleas, the mouthy first, and non distorted faces, this hits us humans psychologically… )
(Where the nonsens text bug come from? It looks like a engine test not deactivated. Where DallE gets the idea from to put prompt fragments in the pictures, if it not know how to visualize them…?)
Here is a test with prompt complexity, intentionally with a prompt a bit on the limit of “graphic tokens”, more and the picture get strange, things missing etc. One long and detailed, the other reduced to minimum. Principally the same result.
So i speculate that DallE kills out everything witch can not be understand graphically. maybe the reason why it is difficult to place objects precisely (and maybe for the scatter effect a bit too.)
Maybe we can stop writing poems, catchwords are enough.(?)
There is more testing needed…
Prompt long
{
"prompt": "A creature from an unknown species with an extraordinary, almost supernatural appearance bathes in a crystal-clear lake, surrounded by a mystical night landscape. The being has large, ethereal wings that gently glow in the water, and translucent skin with a shimmering yellowish aura that illuminates the dark surroundings. Its eyes shine with a subtle golden glow, while its wings reflect the moonlight, causing the water around the being to glow. The lake is fed by a multi-tiered cascading waterfall, where crystal-clear water tumbles over shimmering rocks, creating a fine, colorful mist that refracts the light into soft rainbow hues. The trees around the lake have deep purple leaves, with branches arching over the water and waterfall, reflecting the being's shimmering light. Some trees even extend beyond the waterfall, and their foliage glows lightly in the darkness. In the foreground, orange plants and yellow flowers enhance the magical atmosphere. The sky is dark blue, streaked with light clouds that gently reflect the being's light. The contrast between the purple trees and the yellowish glow of the being creates a mysterious yet peaceful atmosphere. The scene feels otherworldly, filled with mystical and ethereal elements. Photo style.",
"size": "1792x1024",
"n": 1
}
… A other interesting question is, how many graphic elements we can use in DallE.
There is a point where the picture gets lower quality, and a point where elements get missing and are ignored.
(Sadly i am not aware of any documentations from the devs.)
A few apples. A few apple-inspired oddities. A pineapple. And a game of find the errors.
By the time you get to “window” - a computer with Windows?
I had to verify gpt-4o sent the list without screwup. Like it did when sending to python for the sort. And a deduplication from 260 tokens, showing attention in producing the words.
Oh great, thanks much for the info!
“256 tokens”, this make sense.
What i am still not fully get is how DallE interprets the language and connect it with the graphic weights. I speculate now, we can left out poetic and too descriptive language, specially story telling. Because there is no training data connecting this info with a image in training (if there is 1 2 images where it fits, this is too little data in the weights to have a effect). I think we should use very simple language like “Tree with red laves” or “Tree with red laves on top of mountain”. And sum up all the atmospheric infos, in 1 sentence like, “romantic magic mood” or “gloomy night mood”.
We still speak with the AI like with a human, writing a poem for it, but what it needs is graphic tokens.
And we should not use all the 256 tokens for graphic elements, because of the scattering effect. It is like mixing colors, if you mix ALL colors together, you get shit-brown. The more the system can place information’s correctly, the more graphic tokens can be used to describe a precise setting. But location and placement is still a issue for DallE.
I get sometimes the most beautiful images with very simple prompts.
If you have a link to the developers, pleas let them know to let as use seeds! if there is a concern that pictures will be different as soon the weights are updated, just inform the users that there is no guaranty to get the same image, or give access to all weights versions. To be able to use seeds would make it so much easier to see the effects of words in a prompt, and this would speed up the learning enormously.
A lot of us have been asking for this for a while. I’ve noticed slight improvements, and I wonder if we’re on a DALLE3.5 model or something. There’s been no official word, so it could just be me “seeing things”…
OpenAI does have a bit of room to catch up with the others… Being better with negative prompts is another thing that needs work… Overall, getting much better slowly!
I will test MJ soon and SD, i have even checked the self standing open SD setups. My machine is just not strong enough. You need some strong GPUs.
The days simply have not enough hours…
Like you know, i think they had a weights infection recently, and after they had updated it again i think, because the system was down for some hours.
…“seeing things”, yes me too, i have redone some of my favorites, but i could not see much change, if it is there, it is very subtly. And could be the update maybe affects some styles more then others. Maybe not 3.5 but 3.05
From what I know, the core DALLE team is really small, but they do a lot. They may have grown since I had more contact with some of them, but they seem to be “heads-down” and busy, I hope!