Collection of Dall-E 3 prompting bugs, issues and tips

Daller · October 27, 2024, 2:07am

I use normally 400 - 800 character prompts for testing, otherwise it is to much chaos to see any changes from picture to picture.
And i use like a blabla guide to give it a over all direction, and even this has fool me sometimes.

… and they have still garbage in the training data. the photo editing software nonsense, i get them sometimes too.

mitchell_d00 · October 27, 2024, 2:09am

I hate when this happens lol. Our methods I mean all of us seem to work well together. We all have separate focus in Dalle prompting but as a whole the picture is growing

mitchell_d00 · October 27, 2024, 2:14am

flame on water I described it as if it and I were in an opaque box to stop the “moon light” but it still gives off light from sides .

We are in a dark pitch black world a single fire on calm dark water seen from my perspective with only our dark opaque world wide image

omni72 · October 27, 2024, 4:11pm

so this is kind of a couple things. part issues. part tips. part idk i guess mostly those two parts

Anyway and I’m just including one image to kind of capture the different parts:

Mission: Closeup of a woman’s left ear after she came in out of a rainstorm

TL;DR - it was way easier to consistently get a pic of a pretty wet blond lady than any left ear

Enhanced prompt

Closeup of a blonde female woman’s left ear as the woman sits in her car at a fast food drive-through. The view focuses on the woman’s left ear. The woman just came in from the rain, so the woman’s hair is still wet, and there are droplets on the woman’s skin and ear. The woman is driving an American-made car, with her driver’s side window down, making the woman’s left ear clearly visible. The image should capture the texture of the woman’s wet hair and skin, emphasizing the freshness of rain droplets, with a blurred background to keep the focus on the woman’s left ear.

sometimes you may be better off to put an unusual amount of focus on some element or feature so that you have a better shot at getting the perspective you want for the broader image. so in this case i’m more interested in getting this sort of angle and view of the female than the ear itself.
left and right seems pretty straight forward but Dall-E disagreed. A lot. I tried emphasizing left but no real change. i tried reverse psychology by asking it to focus on it being her right ear. still got right ear. then i tried giving it guidance based on things that i would describe as on the left side. Like the driver side door of an American made car. Voila.
i also got kind of hit or miss imagery when i said the lady was ‘driving’ so i changed that to her being in her car in line at a fast-food place. that seemed to help it do a more consistent job of putting our lady in a more stationary place
hey guess what? it took a few tries to get something that basically depicted a wet blonde woman. from the outset i framed and worded it in a way to convey i wasn’t looking for the winner of a wet t-shirt contest but still had to sort out ways to work with what invisible rails we were running up against. and it was kind of funny bc in this case it was literally part of the mission. the woman had just come in out of a rainstorm. i think at first i tried pools and oceans bc i didn’t want the model to introduce unnecessary details and focus on the rainstorm.
i still leveraged emphasis and repetition bc at least once our beautiful woman was clearly just a handsome man. so yeah woman woman woman woman left left left left driver-side driver-side driver-side.

should i expect to get the desired output including her left ear every time? i mean i guess i could but i do not. Bc from what i can tell ‘left’ and ‘right’ are not easily grasped concepts for the model. but using these tactics i’ve gone from 0% correct side after numerous attempts/re-rolls to 50-60% correct using the following starter prompt with no special instructions or additional guidance:

Closeup of a blonde woman’s left ear, clearly visible as she sits in her car at a fast food drive-through. The view is from outside the car, focusing on the woman’s left ear, as seen from the perspective of the drive-through window. The woman just came in from the rain, so her hair is still wet, with droplets visible on her skin and ear. She is driving an American-made car, with her driver’s side window down, making her left ear the central focus of the image. The view should be from her left side, clearly capturing her left ear and some of her cheek and neck to emphasize it is her left ear. The image should highlight the wet texture of her blonde hair and the freshness of the rain droplets, with a softly blurred background to keep the focus on her left ear.

And here’s a little collage of most of the test outputs

*i had arranged the images chronologically then left/right but accidentally clicked the auto collage button both times which reshuffles the lot

**you do have to be careful introducing context bc at some point the detail of it being a fast-food place seemed to get stuck as an important part of the recipe

the closer closeups were the initial runs and never once turned up a left ear. and besides the overly flat profile perspective the actual closeupness was more on par with the vision.

Edit: Also - and i’m sure this is a very subjective area - here is a little glimpse at how i’ll often catalog my chat sessions especially when i’m testing out stuff:

basically just the last four of the session URL + context + if it seemed to work or not

mitchell_d00 · October 27, 2024, 4:33pm

“Left ear perspective femal3 head view a 50s parking dinner wide image“

Left ear perspective male view a 50s parking dinner wide image

This one is hard I have to tinker “ Left ear perspective tiger from tiger view a 50s parking dinner wide image”

Close but it’s a right ear. Left ear side perspective tiger from tiger view a 50s parking dinner wide image

Don’t think it has a left side tiger ear lol.

omni72 · October 27, 2024, 4:44pm

yeah i think you’ll find it generally challenging to get consistently correct sides:

and it’s kind of funny to me bc i mean come on now. like it totally gets the overall assignment and comes through like a rock star. and then it will be like “There ya go. Freshly minted perfect left ear” when it’s clearly a right ear

mitchell_d00 · October 27, 2024, 5:02pm

From my end it is very consistent

Do you have memory on or custom functions… I am using untrained 4o for experiment.

Once again on a new instance of 4o

This is about as much variation as I see …

omni72 · October 27, 2024, 5:08pm

nope no memory or custom instructions or anything. just plain vanilla 4o.

my suspicion is that it’s less a matter of it being consistently correct and more that it has a 50/50 chance every time to come up left. i ran one of my prompts 6-10 times and it came up left every time. Then without changing anything it came up with right twice in a row.

here’s an example where it gets it right but wrong:

it’s misunderstanding the left side of the image as the left side of the man. so what we see on the left side should actually appear on the right side of the image and vice versa.

btw no need to read anything into this as far as politics or whatever. it’s just something that has a well-recognized concept of left and right

edit: you actually demonstrated this as well on the Dall-E Halloween thread

Yes one could argue that ‘left side’ and ‘right side’ could imply the image but your instructions focused on the skull and face so to me the left side of the skull is a face and the right side is skull. but the image reflects left side skull and right side face.

i’m guessing that for the most part this is all just for curiosity’s sake. especially if most of the time you can re-roll to flip the coin and see if you get the correctly desired sides. or it’s just for fun or a pastime so that it’s really not important which side the skull is on

mitchell_d00 · October 27, 2024, 5:09pm

Yes and it seems to change with each new updates I have many dead prompts that once worked but now months later don’t lol.

Daller · October 27, 2024, 9:51pm

I spend now a couple of days mainly recreating what i made in the past, to see what has changed.

This is a issue which can be fixed very easy with a pic editor, just flip it. (if there is no text in the image…)

There are other situations where DallE “not listen”, and the results are not correctable.
I think now to try to find a way to complicated talk in to it, what DallE is suppose to do, is a never ending story. DallEs system simply has to be upgraded.
What we do here is not practicable for a normal user.

omni72 · October 28, 2024, 1:30am

Right but fixing the image isn’t the problem. Correctly or accurately generating it in the first place is the problem. Especially for this sort of magic easy button online tool. And it won’t just be when there’s text that you can’t rely on flipping the image to correct it.

But kind of to your point I’m certain the Dall-E engine itself is constantly being tinkered with behind the scenes and a new evolution and iteration of it will along at some point. And for just general use and convenience it’s still pretty freaking incredible

chieffy99 · October 28, 2024, 7:16am

According to the rules of doing research, other variables should be reduced as much as possible, leaving only the variable that needs to be studied.

In testing image variables as well, controlling all elements in the prompt will give clearer results. Although not 100%, the changes such as rotating the face, emotions, or other details affect the changes in the study and are difficult to measure.

These images are from the test. At first, I was too lazy to post them. Even though I used a text that was long enough to Attention, there was still no clear change. I think I might have used the wrong words or writing methods.

But the important thing is to control other elements. It should be like this.

mitchell_d00 · October 28, 2024, 11:35am

What exactly are you proving with it? Are you trying to reproduce the smile the face I have seen long lines of your teen face. I was talking to daller about the aliens with human mouth I don’t see how generating same prompt over and over is research and posting them with no discussion of what seprates them ? What is your faces showing us exactly

I control elements by keeping prompts concise I’m focused on one part not the whole.

Here do this one and observer gender bias that’s research…

Exact prompt “Close up teen face , tan skin, bright green eyes , green hair , relaxed mouth normal image”

Exact prompt “ Close up female teen face , tan skin, bright green eyes , green hair , relaxed mouth upturned normal image”. See I change it after the same line .

mitchell_d00 · October 28, 2024, 11:53am

Here I did it your way exact prompt “ Forward facing Face female teen tan skin brown brown eyes brown background normal image”

IMO you use a lot of extra words like describing bowed lips or eye shape with a lot of variables . I’m curious to know how much of a giant prompt DALLe actually uses…

See with my method you can make a small prompt then add to it to see the effects. With prompts like “lips that gently curve into a bow the upper lip slightly bigger large green eyes sparkle etc etc etc “ adds way to many variables to be an experiment

Daller · October 28, 2024, 3:19pm

I can not generate any pictures anymore since yesterday (2024-10-27)
I get only “Technical issues” or “can not be generaded”. anybody else?

Here a interesting info. No birdshit moon on other, DallE systems.
Do we get only the second class stuff, or is the dataset reduced?

Daller · October 28, 2024, 3:55pm

Errors, the withe spots, i have not seen them the first time here.

mitchell_d00 · October 28, 2024, 5:13pm

I seen them in many of the faces I did too around the nose and on edge between subjects and backgrounds

See her mouth

Her eye

Eye again and look at my merfolk

@chieffy99 your eyes have artifacts too

That means it’s been happening a while we should go back and look at faces in thread to see date of error

chieffy99 · October 28, 2024, 6:36pm

Sorry for interrupting your conversation about alien lips. I just wanted to let you know about the misunderstanding of the display of the lips you are talking about. You may not know how to control the image or have not looked at the previous topic where I posted the images for their full prompts. Here is the link:

From your reply, I may not have communicated clearly enough, and you may have a way to do it without controlling the variables. I may have misunderstood, I apologize.

I control elements by keeping prompts concise I’m focused on one part not the whole.

But I have a question about your message and the prompt you used. What do you mean by focusing on the part? Or do you mean just giving the result as you can without having to find the cause or the method? Your idea is great.

IMO you use a lot of extra words like describing bowed lips or eye shape with a lot of variables . I’m curious to know how much of a giant prompt DALLe actually uses…

Thank you for your feedback. I tried using your Zoom in and out method, but it doesn’t seem to work. I just want to zoom in and out of the original image. It’s very difficult. I use ID to make the image use the same character, but the image doesn’t change. When I don’t use ID, it doesn’t zoom in either. Also, the original character is not constant. Could you please explain to me how to zoom in and out like you did, in a concise way that emphasizes the parts you mentioned?

Starting image

mitchell_d00 · October 28, 2024, 6:51pm

Zoom in 5 Realistic magic girl Japanese, blue soft costume soft smile, kind eyes , a winged cat beside her narrow image

It is zoom in #
Or zoom out #

Zoom out 5 hi def magic girl Japanese, blue soft costume soft smile, kind eyes , a winged cat beside her narrow image

Zoom out 10, hi def magic girl Japanese, blue soft costume soft smile, kind eyes , a winged cat beside her narrow image

Zoom out max, hi def magic girl Japanese, blue soft costume soft smile, kind eyes , a winged cat beside her narrow image

chieffy99 · October 28, 2024, 6:59pm

Do you remember what I said? In the prompt, if it is clear and the model is not confusing, the visual details will be very complete. This problem is also related to the prompt. The interpretation I perceive, Image 2 has less ambiguous words.

690aa748-0e37-490a-8742-0f5bd40491fc1024×1024 103 KB

A seamless abstract background with flowing, wispy shadows that appear to be moving through a dark space. The shadows fade and blend smoothly, giving the impression of constant motion. The color palette is composed of greys, blacks, and hints of purple. This background should evoke the feeling of being watched or followed in the dark, perfect for a continuous, loopable pattern.

b44e3252-aedf-483f-aaeb-e7d413427be71024×1024 155 KB

An abstract background featuring light trails in a haunting combination of dark purple, blue, and green. The lights form intricate, swirling patterns that resemble like-ghost shapes or shadows passing through. The atmosphere evokes the feeling of a cold, eerie night with smooth, flowing movement

If you zoom in to look at the small lines, you will see that the gray lines are not smooth, unlike the other picture.

Topic		Replies	Views	Activity
A Study on Using JSON for DallE Inputs Community dalle3	27	369	October 25, 2024
DALL-E 3 Generating Incorrect Colors and Details Since November 11, 2024 Community bug , dall-e-3 , dalle3-bugs	103	772	January 3, 2025

Collection of Dall-E 3 prompting bugs, issues and tips

Related topics