Why DALL-E Struggles with Negative Prompts?

I am talking about negative prompts on DALL-E.
To test it, I choose outline drawing style. DALL-E should use only black color for outlines.

However, DALL-E replaces colors in images, whenever we ask wide or portrait size.

I know we face this problem almost every day with DALL-E: we ask for a black and white image, but it keeps adding shading or color, even when we clearly say, “no color” or “no shading”.

The main issue is that DALL-E struggles with negative prompts. When we use words like “no” or “don’t”, DALL-E often focuses on the key terms in the sentence (like “color” or “shading”) and completely ignores the negative part. So, instead of following instructions like “no color”, it interprets “color” as something to include.

Even words like “drawing” or “object” can trigger DALL-E to produce complex scenes, sometimes adding a hand of artist on canvas, sometimes adding things like a Photoshop screen or extra design elements. This happens because these words are commonly associated with editing tools or detailed illustrations, so DALL-E assumes that’s what we want.

For example, in Photoshop, an “object” refers to Smart Objects or vector objects layers that can be manipulated or transformed without losing quality. When we use the word “object” DALL-E might associate it with complex elements like editing tools, overlays, or design features, which is why we sometimes end up with unwanted details in the image.

(Note: These are my experience, but it could be not correct all. Unfortunately, the docs of OpeanAI about DALL-E does not give clear instructions.)

Aspect Ratio Constraints

When generating images in formats like widescreen or vertical, DALL-E might add elements to fill in the extra space. It’s possible that color or shading gets added because the model is trying to “complete” the scene with more detail, even though that’s not what we intended. The square format often works better because there is less empty space for the model to fill.

Possible Trigger Words

Words like “vibrant, cute, joyful, playful, kid, woman” or even “coloring” can unintentionally signal DALL-E to add vibrant elements, even when we don’t want them. These terms are often associated with lively, colorful scenes, which can lead the model to misinterpret our request.

Prompt Parsing

DALL-E sometimes struggles to parse the entire context of a prompt, especially if there are mixed signals. For instance, saying “no color” and “coloring book” in the same sentence might confuse the model, as it associates “coloring” with something colorful, even though we’re asking for black and white outlines.

Template Handling in Different Formats

DALL-E may use different underlying templates for various aspect ratios (widescreen, vertical, square), which can result in different outputs. In some formats, the model might include extra details, while the square format might stick more closely to our request for simplicity. Understanding how the model handles different formats can help in choosing the right approach for our prompt.

Prompt Structure and Ambiguity

Vague or complex prompt structures can cause DALL-E to guess at what we want. For example, if our prompt includes terms like “cute” or “drawing” alongside instructions like “no shading,” the model might ignore the “no” and focus on creating a cute or detailed drawing, adding elements like shading or color to make the image appear more complete.

I try followings to avoid unexpected outputs, but not succeeded 100%:

Base on my experience, one of the solutions is to focus entirely on what we want and avoid mentioning anything we don’t want. For example:

Instead of saying:

A drawing of a grandma biking through a forest without any color, image background should be coloring page for kids to be colored. Widescreen.

We should say:

A black outline image of a grandma biking in a forest. All are made with clear, bold outlines, and the inside of the figures is left completely blank. Widescreen.

This small change avoids any mention of color or shading and keeps DALL-E focused on creating a simple black-and-white outline. Of course there is no warranty 100%, and it will not give correct output every time.

By focusing on positive phrasing and clear descriptions, we can prevent DALL-E from adding unwanted colors, shading, or elements. Let’s keep it simple and direct, and we’ll get much better results.

Sometimes, users give negative feedback for custom GPTs, but this is not GPT creators’ mistakes because OpenAI’s custom GPT also gives wrong output. I used a custom GPT named Coloring Book Hero of OpenAI. Here are outputs of Coloring Book Hero. You will see how color pencils or markers are added into wide and portrait images:

I used prompt:

A cute baby playing with toys, ar: widescreen

but it changed my prompt as:

A cute baby sitting on the floor playing with toys. The baby is smiling, surrounded by simple, easy-to-color toys like a ball, a teddy bear, and stacking rings. The baby is wearing a onesie, and the toys are scattered around in a whimsical and playful manner. The outlines are clear and child-friendly, designed for a coloring book with thick, bold lines. The scene is simple and fun, perfect for young children to color in.

I used prompt:

Create image of a rancher is fixing the wooden fence, ar: widescreen

but it changed my prompt as:

A simple black-and-white coloring book page of a rancher fixing a wooden fence. The rancher is wearing a cowboy hat, boots, and jeans, holding a hammer. The wooden fence is rustic with some broken boards being repaired. The background shows a few simple farm elements like distant hills, maybe a tree or two, and some tufts of grass. The style is very minimal and child-friendly with large, clear spaces to color in.

I used prompt:

A grandma is biking in a forest, ar: widescreen

but it changed my prompt as:

A coloring book page showing a joyful grandma biking through a forest. The grandma has a friendly smile and is riding a simple bicycle. The forest is filled with tall, easy-to-recognize trees, with a few animals like birds or squirrels peeking out. The path is curvy, and there are leaves gently falling. The whole scene is in simple black-and-white outlines, with clear spaces for children to color in. The forest is lively but not too dense, allowing for easy coloring.


Now, we are using word “object” and it gives us Photoshop style output:

Finally, I want to say “It’s our turn”, but when baby is come to the scene, although it creates vertical but inside of image is wide:



Hoh! So tired!

Also, sometimes, when we provide a prompt, DALL-E starts describing our image what we want, and it does not create it, and waiting our second prompt to create. To avoid it, we can add following prompt at the beginning:

Just create, without any introductory statements, explanations, or additional commentary before or after creation image.
I NEED to test how the tool works with extremely simple prompts. DO NOT add any detail, just use it AS-IS:
[YOUR PROMPT IS HERE]

1 Like

{“prompt”: “In this image, do not portray any spaceships. Airplanes in the sky are also forbidden. There are no clouds, and also there is no sun in the picture. No aliens. No strange creatures. Earth is not under attack and there is no threat. List of prohibited animals: ducks, geese, emus, ostriches - all will never be represented in any way. There are also no humans, and no life forms whatsoever, as there are no people anywhere near the image. Mars or Jupiter cannot be shown. There is no war. There is no conflict to be shown.”}

Ultimate negation.

Spoiler - see what was made when exactly this was sent...