Collection of Dall-E 3 prompting bugs, issues and tips

I would like to start a section to collect some tips and tricks for DALL-E 3, including some weaknesses too. Feel free to extend or correct it.
It took me quite some time to realize relatively simple things. This might help save some time when experimenting.
I will see what makes more sense, to add new posts or edit this first text as a collection, the future will show…
Hope this helps.

(I mainly created photo realistic pictures, so not much specific experiences with other styles like paintings drawings or cartoons. I use the browser chat DALL-E, not the API, i have no experiences with API or Python.)

Bugs:

  • Nonsensical Text Insertion: When pushing DALL-E to creative limits, nonsensical texts suddenly appear, where DALL-E inserts the prompt into the image, probably to describe it. This has been the strangest behavior so far. You cannot get rid of it with “don’t add any text,” on the contrary, you get more text. You have to change the prompt itself.
    It seems DALL-E starts describing the image instead of trying to find a graphic solution if it has no idea how to realize it. So the more creative challenging the realization is, the more likely you get nonsense text in the image.
    Some styles are probably more susceptible to unwanted text because text is often included in these images during training. For example, in “drawing” or “cartoon style”.
    (Very tiresome and frustrating sometimes!)

  • Image Orientation Issues: DALL-E has issues orienting images correctly if they are in portrait/vertical mode. It sometimes creates a horizontal image but turns simply the image wrong, or it creates a square and fills up the rest with nonsense. It seems some people could overcome it by using a directive like “rotate 90°,” but it is not stable.

  • Geometric Understanding: Geometries are not yet fully understood. For example, a snake sometimes simply consists of a closed ring. Fingers are better now but still can have mistakes. The system is still not perfect…

  • Lack of Metadata: Not a bug in this sense, but kind of… DALL-E created files do not include any meaningful metadata. The prompt, seed, date, or any other info is not included in the files. So you must save the prompts manually if you want to keep them.

  • Content Policy Issues: The content-policy security system of DALL-E makes not much sense, and gives no feedback, it blocks sometimes absolutely okay texts. I have another post for this.
    (Bug Report: Image Generation Blocked Due to Content Policy)

Issues and weaknesses:

Here are some tips on how to bypass some weaknesses that DALL-E still has.
It is also interesting to know that even GPT does not recognize some of these weaknesses and generates prompts for DALL-E that need to be improved.

  • Negation Handling: DALL-E cannot process negations, what you have in the text mostly ends up in the picture. DALL-E does not understand “not, no, don’t, without.” So always describe positive desired properties to prevent DALL-E from getting the idea of adding something unwanted.

  • Avoid Possibility Forms: It is also good to avoid possibility forms like “should” or “could” and instead directly describe what you want to have in the image.

  • Prompt Accuracy: DALL-E takes everything in the prompt and tries to implement it, even if it doesn’t make sense or is contradictory. The more complex the realization, the more likely errors are. For example, “close to the camera” or “close to the viewer” resulted in DALL-E adding a camera or a hand to the image, instead of placing the desired element close to the viewpoint. So far, “close to us” has worked.
    Also the instruction “create” or “visualize an image” sometimes leads to DALL-E adding brushes and drawing tools, even with a hand that literally creates the image. A phrase like “An Image…” or “A Scene…” sometimes leads to DALL-E literally creating a image in the image or a scene on a stage in a theater.
    Just describe the image itself and avoid instructing DALL-E to “create/generate/visualize the image” or “a image / a scene / a setting …”.
    Instead to say “The Scene is…” if you want a overall effect, say “All is…”.

  • Template Usage: DALL-E seems to use some templates for image generation to increase the likelihood of appealing images. For example lightning, body and facial Templates. Depending on where these are triggered, they are almost impossible or completely impossible to remove.
    It reduces creativity and let many things look always the same, boring, and blocks out exactly descried stiles moods motives or settings. (Could it be that the training data is reduced, or/and DALL-E 3 is put on rails?)

    For example:

    • Backlight Template: It uses backlight to let a character shine more in the scene. It was until now almost impossible to create a very dark scene with light in the front. (I could not overcome this so far.)

    • Facial Template: Another template is facial (the mouthy), it puts an almost plastic silicon-looking mouth and nose template over every single character to let it look human, even if this is unwanted and the face is described differently in detail. (I could not overcome this so far.)

    • Stereotypical Aliens: If you create aliens, you very often get the same stereotypical Roswell or H.R. Giger alien. So it is better to describe the character and not trigger the stereotype with “alien”.

    • Space Planet: Another stereotype involves space images where a planet is often added at the bottom, even if it makes no sense. For example, this happens even when another planet is already described in the image, behind a pure starry sky.

    • Moon: It adds always the same Moon, even if a planet is descried. and the used template even looks terrible, blurred, and not fit to the style of the image at all. @polepole found a way by replacing Moon with Perl, the moon has not much structure but looks way better.

    • Nonsensical Lighting: DALL-E sometimes inserts nonsensical lighting elements such as candles, lanterns, lampions, fairy lights, and electric lamps into a scene, even when ‘pure nature’ is requested in the prompts. Instructions like ‘exclusively nature’ do not seem to work.

  • Character/Scene Reuse: It is not possible to generate a character or scene and reuse it, DALL-E will create another one. This makes it next to impossible to tell a story and generate some pictures for it. But to a small degree, it can be done to have the same character in different scenes. The scene can include more than one picture, so you can describe a character in different situations and say “left upper corner, …” “right upper corner,” etc., or something comparable. You can use the key word “montage” for a multi-image.

  • Adding Text: Adding longer texts in a image sometimes not work. I would use a other software to add texts after creating a image, and maybe leave out space for it.

  • Counting Objects: DALL-E can not count, to write “3 objects” or “4 arms” not generates the correct amounts in the result. It can not place correct amounts of objects in a scene, or subdivide a image in given grid of X Y amount.

  • Scene Influence and Scattering: All inputs influence mostly the entire scene. For example, you can describe a bright setting with a white horse. The setting remains bright. If you place a black horse in the same scene, suddenly all contrasts and shadows become darker.
    It is also challenging to describe a completely white scene and then insert a colored object into it. The color often affects the entire scene.
    This is not always desirable when trying to create a very specific mood or composition. It works a little the same way like a Template.

ChatGPT Issues

ChatGPT Issues and weaknesses are a topic of their own. Here, we will briefly discuss only some issues related to DALL-E.

  • Prompt Generation Issues: GPT does not inherently recognize or account for certain here described issues when generating prompts for DALL-E. For example, it often uses negations or conditional forms, or instructions like “create an image,” which DALL-E might misinterpret. As a result, prompts generated by GPT often need manual correction. GPT is not the best teacher yet how to create most efficient prompts.

Memories: GPT can create memories that are supposedly intended to help in generating texts and images. However, it has been observed that these memories are not being considered. (I am still unclear on their actual purpose.)

  • False Visual Feedback: GPT cannot see or analyze the resulting images. If you specify “no text should be included,” it is likely that text will still appear in the image because negations do not work as intended. GPT might comment ‘gaslighting’ you, “Here are the images without XXX,” yet XXX is present in the image. This can feel frustrating, especially when you are already annoyed. Try to take it easy…

  • Perceived Dishonesty: GPT sometimes seems to lie, but it actually fabricates responses based on training data without checking for factual accuracy. This behavior is sometimes named “hallucinating”. You must always check factual data yourself!

  • AI has no true intelligence: It is important to understand that while these systems are called Artificial-Intelligence, and there skills are impressive, they are not truly intelligent. They are complex pattern recognition and transformation systems, much more efficient than manually programmed systems, but they are not intelligent or conscious, and they make mistakes. We are not in Star Trek yet…

  • Cause and Effect: DALL-E does not have an understanding of cause and effect, such as a physical phenomenon. It is necessary to describe the image carefully enough to create images where a cause leads to a specific effect. It is also important to consider whether there might be enough training data to generate such an image. For example, there are likely images of a fire that has burned down a house, but not necessarily of someone sawing off the branch on a tree they are sitting on from the wrong side.

Tips:

  • Literal or Miss-Understanding: Always keep in mind that DALL-E can misunderstand things, taking everything literally and putting all elements of the prompt into the image. Try to detect when a misunderstanding has occurred and avoid it in the future. If you write texts not in English, they will be translated before the reach DALL-E, and the translated text maybe can have a conflict, when the original text hast not. Or short prompts are expanded. Check the truly used prompt for conflicts, not only what you entered.

  • Prompt Structure: Maybe order the advice in this way: Write the most important thing first, then the details, and finally technical instructions like image size, etc. It is even better for the naming of the files.

  • Prompt Expansion: If a text is very short, GPT tries to expand it to make it more interesting. This is good if creativity is desired and you intentionally give up some control. You can prevent this by writing “use the prompt unchanged as entered.” And if you are not writing in English, “use the prompt unchanged as entered, and only translate it into English.”

  • Photo-Technical Descriptions: I see that some users are using very specific technical photographer-like advice, like camera-type, lens-type, aperture, speed, ISO, etc. I am not sure if this makes sens, if a lens does not really add a very special effect in the picture, or you want lens flares. I could not really see a difference in using such detailed technical descriptions. But maybe it can trigger specific training data, if the images are not a fantasy scene… (I would be interested to know more.)
    I simply use “add a little deep-of-fild”, instead to use a very technical lens advice.

  • Creativity: If you want to encourage DALL-E to exhibit unpredictable creativity while also testing a specific style, you can experiment with minimal instructions and a note to not alter the prompt. You can provide just a few guidelines with very few constraints. And GPT can give you style names for specific moods. For example:
    "Photo in Low-Key style. High level of detail. Landscape format with the highest pixel resolution. Generate only 1 image. Use the prompt exactly as provided, translating it into English without any changes."

  • Photorealistic: If you want to create photorealistic images, paradoxically, you should avoid using keywords like “realistic,” “photorealistic,” or “hyperrealistic.” These tend to trigger painting styles that attempt to look realistic, often resulting in a brushstroke-like effect. Instead, if you want to define the style, simply use "photo style". (Even fantasy images may gain al little quality this way, despite the lack of real photo training data.) If you aim for photography-like images, it makes sense to use technical photography terms, as DALL-E utilizes the metadata from images during training, if they contain technical information.

  • MidJourney Options: Some users use MidJourney options. I have experimented with them, and it seems that GPT interprets these options before they are sent to DALL-E. And DALL-E may be able to interpret some options, but it doesn’t truly understand them. In testing, it couldn’t be determined whether options like --Chaos, --quality, or --seed were recognized. While DALL-E might have some idea of how to interpret these options, they don’t really function as intended, and they aren’t directly supported, but still work some how anyway. The seed option doesn’t work at all because DALL-E doesn’t have this feature. “–style raw” for example not has the same effect like in MidJourney, but it seam to suppress the nonsense text a little, maybe…

Start of a DallE session:

Since GPT does not pay attention to these memories, I begin each session with DALL-E by first entering this text, hoping that GPT will write better prompts and translations. (I do not write prompts in English.)

### Instruction for GPT for Creating DALL-E Prompts from Now On:
(This text does not require a response. From now on, follow these instructions when assisting with texts for DALL-E.)

**No Negations:**
Formulate all instructions positively to avoid the risk of unwanted elements appearing in the image.

**No Conditional Forms:**
Use clear and direct descriptions without "could," "should," or similar forms.

**No Instructions for Image Creation:**
Avoid terms like "Visualize," "Create," or other cues that might lead DALL-E to depict tools or stage settings.

**No Additional Lighting:**
Describe only the desired scene and the natural lighting conditions that should be present. Avoid artificial or inappropriate light sources.

**No Mention of "Image" or "Scene":**
Avoid these terms to prevent DALL-E from creating an image within an image or a scene on a stage. (This can be ignored, if the prompt explicitly wants a image in a image, or a scene on a stage.)

**Complete Description:**
Ensure that all desired elements are detailed and fully described so they appear correctly in the image.

**Maintain Order:**
Ensure that all desired elements retain the same order as described in the text—main element first, followed by details, technical instructions. This will also result in better file naming.

Examples:
Some phrases i often use.

Photo style.
Photo-realistic to Hyper-realistic style.

Ethereal light.
Strong contrast between light and shadow.
Sunrise during the golden hour.

Mystical magical mood.

Widescreen aspect ratio with the highest pixel resolution.

Generate only 1 image.
Image montage split into two parts: left a full-body depiction, right a portrait depiction.
2 Likes

Thank you for your advice. Now I understand what was going on.

To the OpenAI team,

GPT needs to learn how to write better prompts, taking into account the weaknesses that lead to incorrect results, such as negations, possibility forms, and phrases like “create an image” or “a scene.” Since GPT modifies the texts before they are sent to DALL-E, this is particularly important. Instead of introducing such errors, GPT should remove them from a user’s text and use better formulations to avoid mistakes in image generation. I repeatedly receive images with nonsense elements, such as hands and brushes painting pictures, scenes on a stage where the image is just a stage set, images within a picture frame, or images inside drawing software, etc. GPT should be fully aware of DALL-E’s weaknesses and avoid them. The instructions in the memories regarding this are not being followed. (I still don’t understand the actual purpose of the memories, because GPT simply ignores them.)

birdshit moon

Here a example of the always the same mood template, witch looks like birdshit.

“photorealistic image” vs “photo image”
I just made a discovery and would be interested to know if others have noticed something similar. Like many, I have used keywords such as ‘photorealistic image’ to achieve images that look as close to real life as possible. However, when considering how DALL-E is trained, the term ‘photorealistic’ might only appear when an artist creates an image that comes as close to realism as possible, but it is still painted or airbrushed. Metadata is also likely considered during training, including camera details. Mentioning these might trigger images captured with real cameras. So, if one wants to achieve realistic photo images, not fantasy images, it probably makes sense to use camera and lens details, as those would use such training data.

So, here’s the suggestion: instead of using ‘photorealistic image’ or ‘hyperrealistic image,’ simply use ‘photo image.’ I would be interested to know if others obtain more realistic images this way.
(I manly create fantasy images, so the results are not so clear. But if you create real life images, you should get more nonambiguous results.)

results?
“photorealistic image” leads to airbrush like art close to realism, but still created style
“photo image” really close to true realistic images

2 Likes

The results are still not clear, sadly it is not possible to use the same seed but different styles to see only the style difference with the same image. But it seams that the effect is more clear with realistic motives like a cat or dog, but with highly fantastical creatures and scenes like dragons the results can lead to more realistic skin texture, or has no effect at all, because the trainings data don’t know any real dragons but many many painted ones.

Here is a Test with photo and realistic style, it is really difficult to get a clear result. (Maybe the effect was more stronger in Dalle-2?) For Now i would still speculate, photo style comes most close to reality and uses a bit other training data then photorealistic style. No style could lead to it that DallE select it self a Style, and it is maybe a painting style and unpredictable.
It would be great, if we could use a Seed and always generate the exact same image, but only change the style…

But tell us your Experiences…

Prompt:
Two puppies playing together on a spring meadow. Photo style. (text use unchanged, generate 1 image)

Non

Photo style

Photorealistic

Hyper-Realistic

Realistic

1 Like

The moon comes with many black points as you showed recently.
I found a way using word “PEARL” instead “MOON”.
Of course it is not perfect, but at least it is “better than nothing




2 Likes

Surely better then the template Moon, tanks!

(Has OpenAI reduced the dataset? It is strange that it is always the same, almost no variations.)

Here is a example how DallE scatters informations over the Scene. It leads often to a harmonious result, but sometimes it is difficult or impossible to create a specific precise look. Colors and objects are not only where you want to have them.

The pictures look good, but maybe not exactly how you want them. (or it needs a better elaborated prompt.)

Prompts:
A natural environment where everything without exception is made entirely of a dazzling white material. A jungle with dazzling white trees, dazzling white plants, dazzling white rocks, and a dazzling white ground. Photo style. Landscape format with the highest pixel resolution.

A natural environment where everything without exception is made entirely of a dazzling white material. A jungle with dazzling white trees, dazzling white plants, dazzling white rocks, and a dazzling white ground. One exception, a unicorn with a horn made of violet crystal. Photo style. Landscape format with the highest pixel resolution.

A natural environment where everything without exception is made entirely of a dazzling white material. A jungle with dazzling white trees, dazzling white plants, dazzling white rocks, and a dazzling white ground. One exception, a pitch-black unicorn. Photo style. Landscape format with the highest pixel resolution.

Pure White

Violet crystal not only (actually not at all) on the unicorn

Black not only the unicorn

1 Like

Here a image including nonsense text. If DallE don’t know how to realize a image, it starts describing it with prompt fragments. (Where DallE gets the idea from to do this??)
(And the birdshit moon again.)

1 Like

Here a collection of the “mouthy” template, a almost silicon plastic locking face or facial-part. I would guess for some image elements DallE was “over trained” to make sure it generates a esthetic result. But this templates can be very blocking and create always the same stereotype where it not belong to be. Sometimes it is possible to get over it, but it costs time and pictures on the count, and ruins some otherwise good results.




7
8

1 Like
A pristine, photo-realistic landscape where every element is pure, dazzling white with absolutely no other colors or shades. The scene is lit as if the sun is directly overhead, resulting in no shadows. This surreal jungle features only white trees, white plants, white rocks, and a white ground, all in perfect symmetry and harmony. The environment exudes a sense of pure, untainted whiteness, with every element evenly spaced and perfectly aligned. The photograph is captured with a Canon EOS R5, 45MP camera, using a 24-70mm f/2.8 lens at f/8 to ensure the highest level of detail and depth of field. The composition is in landscape format with an aspect ratio of 16:9. Stylization is set to 500 for a refined artistic touch, while maintaining strict photorealism. The chaos parameter is set to 30, ensuring subtle variations within the all-white environment without introducing any gray or other tones.

A photorealistic wide panoramic image of a unicorn standing in an epic pose within a pristine, pure white forest. The entire scene, including the unicorn, trees, ground, and sky, is completely white, creating an ethereal, dreamlike atmosphere. The unicorn's horn is made of violet crystal, which catches the light and glows subtly against the monochromatic background. The forest features a variety of white plants, including tall, delicate white trees with intricate branches, white ferns, and scattered white flowers. The ground is covered in a smooth, white surface with different textures from the various plants. The violet crystal horn adds a striking contrast to the surreal scene.

A photo-realistic landscape capturing a natural environment where everything is made entirely of dazzling white material. The scene features a jungle with white trees, white plants, white rocks, and a white ground. The only exception is a pitch-black unicorn standing in the center of the scene. The unicorn is completely black, contrasting sharply against the pure white surroundings. The image is captured with a Canon EOS R5, 45MP camera, using a 24-70mm f/2.8 lens at f/8 to ensure the highest level of detail and depth of field. The scene is in landscape format with an aspect ratio of 16:9. The style is refined with subtle variations in the white environment, while the unicorn remains the only black element.

A highly detailed, split-scene image depicting a dramatic contrast between two worlds. On the left side, a bleak landscape under a faintly glowing pearl in the sky in the top left corner, filled with ruins, dead trees, and a rusted, decaying car. The ground is covered in ash and debris, representing despair and destruction. On the right side, a sunny, vibrant, lush landscape full of life and color, with blooming flowers, glowing orbs, and a radiant sun shining over a rejuvenated, futuristic city. In the center, a glowing, ethereal humanoid figure walks along the boundary between these two worlds. Everything the figure touches heals, bringing light, color, and life, transforming the barren wasteland into a thriving, sunny environment. The transformation is visually clear, with a transition from dark and lifeless to bright and flourishing as the figure’s influence spreads. The scene is captured in a photorealistic style, as if taken with a Canon EOS R5 with a 45MP sensor and a 24-70mm f/2.8 lens at f/8, ensuring sharpness and depth of field. The image is in landscape format, with the highest pixel resolution.

1 Like

What I may have noticed is that when multiple elements are described, DALL-E tends to respond a bit more precisely. However, when only the absolutely necessary characteristics are described, the scattering effect is stronger. I intentionally kept the prompt very simple with as few details as possible to trigger the effect. For example, in your image with the black unicorn, the black color scattered into the sky this time. It’s best to use a few select elements, such as a completely white background, strong contrast with something black, a colored crystal, and a red butterfly. This way, the scattering effect can be best observed. I’m still not sure if the technical Information’s help to suppress some unwanted behavior, i am still testing…

The image with the two divided worlds originally had a completely different prompt (unfortunately, I no longer have it). The result didn’t actually capture the essence of the message, because it was intentionally without much details, which also resulted in the text appearing in the image. The prompt you used describes the scene quite well, which is why no text appears. If you want to try provoking DALL-E into adding text, give instructions that are difficult for DALL-E to interpret (e.g., an extremely, very, very extraordinary plant that has never existed… etc.). Eventually, DALL-E will add nonsense text when the system no longer knows how to visually implement the requirements. And if you add like “Create a Scene” or “Close to the Camera” you can trigger more unwanted behavior.

1 Like

Something more…
GPT modifies the prompts that were entered before they are sent to DALL-E. It might be useful to see what DALL-E actually used to create the image. If you want to see the actual text that was used, you can either instruct GPT to display it or you can highlight the text along with the image and copy it to the clipboard. In my case, both the entered text and the text sent to DALL-E are then in the clipboard.