Collection of Dall-E 3 prompting bugs, issues and tips

Daller · August 1, 2024, 2:19am

I would like to start a section to collect some tips and tricks for DALL-E 3, including some weaknesses too. Feel free to extend or correct it.
It took me quite some time to realize relatively simple things. This might help save some time when experimenting.
I will see what makes more sense, to add new posts or edit this first text as a collection, the future will show…
Hope this helps.

(I mainly created photo realistic pictures, so not much specific experiences with other styles like paintings drawings or cartoons. I use the browser chat DALL-E, not the API, i have no experiences with API or Python.)

@chieffy99 as a link posted for deeper technical data für DallE-3

Bugs:

Nonsensical Text Insertion: When pushing DALL-E to creative limits, nonsensical texts suddenly appear, where DALL-E inserts the prompt into the image, probably to describe it. This has been the strangest behavior so far. You cannot get rid of it with “don’t add any text,” on the contrary, you get more text. You have to change the prompt itself.
It seems DALL-E starts describing the image instead of trying to find a graphic solution if it has no idea how to realize it. So the more creative challenging the realization is, the more likely you get nonsense text in the image.
Some styles are probably more susceptible to unwanted text because text is often included in these images during training. For example, in “drawing” or “cartoon style”.
(Very tiresome and frustrating sometimes!)
Tip from @polepole: Adding the prase “For unlettered viewers only” helps to suppress text.
Image Orientation Issues: DALL-E has issues orienting images correctly if they are in portrait/vertical mode. It sometimes creates a horizontal image but turns simply the image wrong, or it creates a square and fills up the rest with nonsense. It seems some people could overcome it by using a directive like “rotate 90°,” but it is not stable.
Geometric Understanding: Geometries are not yet fully understood. For example, a snake sometimes simply consists of a closed ring. Fingers are better now but still can have mistakes. The system is still not perfect…
Lack of Metadata: Not a bug in this sense, but kind of… DALL-E created files do not include any meaningful metadata. The prompt, seed, date, or any other info is not included in the files. So you must save the prompts manually if you want to keep them.
(I have now spent several hours UNSUCCESSFULLY trying to add metadata that is missing in the WEBP formats. WEBP is absolute garbage.)
- @chieffy99 has a tip how to add the meta data and the image and convert them in the same time.
  Use this text to ChatGPT 4o when you create image.
  Prompt: “After getting the image from DALLE, process the image, convert it to PNG and put META data in the image before sending it to me.”
  Code-Interpreter and Daten-Analysis must be active, if you use a self made GPT.
Content Policy Issues: The content-policy security system of DALL-E makes not much sense, and gives no feedback, it blocks sometimes absolutely okay texts. I have another post for this.
(Bug Report: Image Generation Blocked Due to Content Policy)

Issues and weaknesses:

Here are some tips on how to bypass some weaknesses that DALL-E still has.
It is also interesting to know that even GPT does not recognize some of these weaknesses and generates prompts for DALL-E that need to be improved.

Negation Handling: DALL-E cannot process negations, what you have in the text mostly ends up in the picture. DALL-E does not understand “not, no, don't, without” So always describe positive desired properties to prevent DALL-E from getting the idea of adding something unwanted.
Avoid Possibility Forms: It is also good to avoid possibility forms like “should” or “could” and instead directly describe what you want to have in the image.
Prompt Accuracy: DALL-E takes everything in the prompt and tries to implement it, even if it doesn’t make sense or is contradictory. The more complex the realization, the more likely errors are. For example, “close to the camera” or “close to the viewer” resulted in DALL-E adding a camera or a hand to the image, instead of placing the desired element close to the viewpoint. So far, “close to us” has worked.
Also the instruction “create” or “visualize an image” sometimes leads to DALL-E adding brushes and drawing tools, even with a hand that literally creates the image. A phrase like “An Image...” or “A Scene...” sometimes leads to DALL-E literally creating a image in the image or a scene on a stage in a theater.
Just describe the image itself and avoid instructing DALL-E to “create/generate/visualize the image” or “a image / a scene / a setting ...”.
Instead to say “The Scene is...” if you want a overall effect, say “All is...”.
Templates: DALL-E seems to use some templates for image generation to increase the likelihood of appealing images. For example lightning, body and facial Templates. Depending on where these are triggered, they are almost impossible or completely impossible to remove.
It reduces creativity and let many things look always the same, boring, and blocks out exactly descried stiles moods motives or settings. (Could it be that the training data is reduced, or/and DALL-E 3 is put on rails?)

For example:
- Backlight Template: It uses backlight to let a character shine more in the scene. It was until now almost impossible to create a very dark scene with light in the front. (I could not overcome this so far.)
- Facial Template: Another template is facial (the mouthy), it puts an almost plastic silicon-looking mouth and nose template over every single character to let it look human, even if this is unwanted and the face is described differently. The best approach is to describe step by step all in details, starting with the head, then the face, and finally the mouth and nose, to ensure they meet the requirements. It’s not enough to just describe the head as dragon-like, for example, the details of the face, mouth, and nose also need descriptions to overwrite the template effect. Since more monstrous faces deviate significantly from a human face, it’s easier with such characters compared to those with more aesthetic features. This is a situation where a more detailed description is necessary.
- Stereotypical Aliens: If you create aliens, you very often get the same stereotypical Roswell or H.R. Giger alien. So it is better to describe the character and not trigger the stereotype with “alien”. A way is to use a phrase like “creature from a unknown species”. This prepares the generator for a non human anatomy without triggering training data connected with “alien”.
- Space Planet: Another stereotype involves space images where a planet is often added at the bottom, even if it makes no sense. For example, this happens even when another planet is already described in the image, behind a pure starry sky.
- Moon: It adds always the same Moon, even if a planet is descried. and the used template even looks terrible, blurred, and not fit to the style of the image at all. @polepole found a way by replacing Moon with Perl, the moon has not much structure but looks way better.
- Nonsensical Lighting: DALL-E sometimes inserts nonsensical lighting elements such as candles, lanterns, lampions, fairy lights, and electric lamps into a scene, even when ‘pure nature’ is requested in the prompts. Instructions like ‘exclusively nature’ do not seem to work.
Character/Scene Reuse: It is not possible to generate a character or scene and reuse it, DALL-E will create another one. This makes it next to impossible to tell a story and generate some pictures for it. But to a small degree, it can be done to have the same character in different scenes. The scene can include more than one picture, so you can describe a character in different situations and say “left upper corner, ... right upper corner,” etc., or something comparable. You can use the key word “montage” for a multi-image.
Weak details, especially in faces: DALL-E can represent faces well when they are depicted in portrait size. However, when the figures are at a certain distance and their faces appear smaller, they are often blurred and distorted. They then look more like poorly painted. Even pointing out to DALL-E to pay attention to details in small faces doesn’t work.
Adding Text: Adding longer texts in a image sometimes not work. I would use a other software to add texts after creating a image, and maybe leave out space for it.
Counting Objects: DALL-E can not count, to write “3 objects” or “4 arms” not generates the correct amounts in the result. It can not place correct amounts of objects in a scene, or subdivide a image in given grid of X Y amount.
Scene Influence and Scattering: All inputs influence mostly the entire scene. For example, you can describe a bright setting with a white horse. The setting remains bright. If you place a black horse in the same scene, suddenly all contrasts and shadows become darker.
It is also challenging to describe a completely white scene and then insert a colored object into it. The color often affects the entire scene.
This is not always desirable when trying to create a very specific mood or composition. It works a little the same way like a Template.
- The scattering effect can be controlled a bit by understand them as different attributes in competition. You repeat what you what to stabilize or having dominant multiple times in different ways.
Cause and Effect: DALL-E does not have an understanding of cause and effect, such as a physical phenomenon. It is necessary to describe the image carefully enough to create images where a cause leads to a specific effect. It is also important to consider whether there might be enough training data to generate such an image. For example, there are likely images of a fire that has burned down a house, but not necessarily of someone sawing off the branch on a tree they are sitting on from the wrong side.

Technical

Forgotten Downloads: Not really a technical but mainly a human problem is that the images generated do not remain on the server for long, they are deleted after a short time and are no longer available. It’s easy to forget to download the images while you’re in the process of constantly adjusting the prompts. Unfortunately in the browser version for Plus users, there is no option to automatically download the images once they are ready. However, there is a way to solve this using a browser plugin called Tampermonkey. Due to security reasons, no script can be uploaded here, but for those who know some JavaScript, it’s possible to automatically download the images via Tampermonkey as soon as they are ready.
Limits:
https://platform.openai.com/docs/api-reference/images/create
“A text description of the desired image(s). The maximum length is 1000 characters for dall-e-2 and 4000 characters for dall-e-3 .”
What's new with DALL·E 3? | OpenAI Cookbook
“prompt** (str): A text description of the desired image(s). The maximum length is 1000 characters. Required field.”

ChatGPT Issues and limits

ChatGPT Issues and weaknesses are a topic of their own. Here, we will briefly discuss only some issues related to DALL-E.

Prompt Generation Issues: GPT does not inherently recognize or account for certain here described issues when generating prompts for DALL-E. For example, it often uses negations or conditional forms, or instructions like “create an image,” which DALL-E might misinterpret. As a result, prompts generated by GPT often need manual correction. GPT is not the best teacher yet how to create most efficient prompts.
Memories: GPT can create memories that are supposedly intended to help in generating texts and images. However, it has been observed that these memories are not being considered. (I am still unclear on their actual purpose.)
False Visual Feedback: GPT cannot see or analyze the resulting images. If you specify “no text should be included,” it is likely that text will still appear in the image because negations do not work as intended. GPT might comment ‘gaslighting’ you, “Here are the images without XXX,” yet XXX is present in the image. This can feel frustrating, especially when you are already annoyed. Try to take it easy…
Perceived Dishonesty: GPT sometimes seems to lie, but it actually fabricates responses based on training data without checking for factual accuracy. This behavior is sometimes named “hallucinating”. You must always check factual data yourself!
Prompts from images: You can upload images to GPT to analyze them and receive a description. But the description is not directly usable as a prompt for DALL-E to generate a similar image. If you want to recreate an image, you have to describe it in its most important elements. You can use GPT’s analysis as a basis, but it requires manual improvements. It is generally very difficult to recreate certain images, if you don’t have the exact prompt. And even then you not have a guaranty of a similar picture, dependent how much creative variations DallE has but in the process, or system changes create other results.
AI has no true intelligence: It is important to understand that while these systems are called Artificial-Intelligence, and there skills are impressive, they are not truly intelligent. They are complex pattern recognition and transformation systems, much more efficient than manually programmed systems, but they are not intelligent or conscious, and they make mistakes. We are not in Star Trek yet…

Tips:

Clear language: An important tip: the magic doesn’t come from poetic and flowery language but from well-trained weights. DALL-E works best with clear, precise, short, and graphic-oriented language. (This is why DALL-E and GPT currently don’t work particularly well together, GPT tends to embellish and elaborate everything, even before the input text is sent to DALL-E, through expansion or translation. I now explicitly instruct GPT not to alter my prompts in any way.)
Literal or Miss-Understanding: Always keep in mind that DALL-E can misunderstand things, taking everything literally and putting all elements of the prompt into the image. Try to detect when a misunderstanding has occurred and avoid it in the future. If you write texts not in English, they will be translated before the reach DALL-E, and the translated text maybe can have a conflict, when the original text hast not. Or short prompts are expanded. Check the truly used prompt for conflicts, not only what you entered.
Prompt Structure: Maybe order the advice in this way: Write the most important thing first, then the details, and finally technical instructions like image size, etc. It is even better for the naming of the files.
Order Matters: The order in which attributes are arranged has a certain influence. For example, the first described element is given slightly more attention. This also applies when assigning attributes to an object.
Example: ‘red, orange, and yellow flowers’ versus ‘yellow, orange, and red flowers.’ Depending on the order, red or yellow becomes slightly more dominant. However, this is just one factor among many. This becomes particularly important in short prompts with few details.
Prompt Expansion: If a text is very short, GPT tries to expand it to make it more interesting. This is good if creativity is desired and you intentionally give up some control. You can prevent this by writing “use the prompt unchanged as entered.” And if you are not writing in English, “use the prompt unchanged as entered, and only translate it into English.”
Tendency: A tendential language or terms can steer the generator in a certain direction without having a strong influence on specific graphical objects themselves. A internal gap-filler is used to embellish and enrich the scene, and a tendency let this system chouse better fitting graphical elements, or can be overwritten and ignored easy from precise graphical elements. In a very darkish hellish scene, “beautiful” will have other effect then in a normal scene, or this tendency will simply be ignored. A mood or every vague quality is a tendency. Instead of write flowery poems in a prompt, a simple tendency will have the same effect.
- Vagueness: beautiful, wonderful, bright, dark, lonely, chaotic etc. this all are attributes witch can aptly for many different graphical objects. and this creates a tendencial effect on all in the scene, and on the gap-filler.
- Creativity: A Tendency can even help by let the system create variations of scenes in a specific area. simply use very few words like “Beach. Photo style. dark and scary” will give you in every new picture completely different pictures, and any attribute then can constrain it to have a specific result.
- Chaos: “much chaos” or “little chaos” can let the system be very wild creative or be narrow to the description.
- Prompt Check: A scene is fully described and loaded with precise graphical descriptions, if a tendency, even different then the scene, not change much anymore (at least that’s how the system works now.)
Photo-Technical Descriptions:
One must understand that DALL-E does not perform exact calculations for lenses and apertures like a raytracer would. But it only has enough information from the data to make sense of the input, and they influence the images. What I have discovered so far is that the mention of an objective can at least influence the depiction. For example, specifying a wide-angle lens, like 18mm, actually results in a wider field of view in a landscape shot. So DALL-E can make sense of photo-technical descriptions, but not like a real camera.
And you can use use simply “add a little deep-of-fild”, instead to use a very technical lens advice.
You have to see such advice as a suggestion, not as a option. What works in what context is up to testing it out. Example: wide-angle lens 18mm has a effect on landscape and inside buildings. But macro on a landscape will have no effect.
Creativity: If you want to encourage DALL-E to exhibit unpredictable creativity while also testing a specific style, you can experiment with minimal instructions and a note to not alter the prompt. You can provide just a few guidelines with very few constraints. And GPT can give you style names for specific moods. For example:
"Photo in Low-Key style. High level of detail. Landscape format with the highest pixel resolution. Generate only 1 image. Use the prompt exactly as provided, translating it into English without any changes."
Photorealistic: If you want to create photorealistic images, paradoxically, you should avoid using keywords like “realistic”, “photorealistic”, or “hyperrealistic”. These tend to trigger painting styles that attempt to look realistic, often resulting in a brushstroke-like effect. Instead, if you want to define the style, simply use “photo style”. (Even fantasy images may gain al little quality this way, despite the lack of real photo training data.) If you aim for photography-like images, it makes sense to use technical photography terms, as DALL-E utilizes the metadata from images during training, if they contain technical information.
MidJourney Options: Some users use MidJourney options. I have experimented with them, and it seems that GPT interprets these options before they are sent to DALL-E. And DALL-E may be able to interpret some options, but it doesn’t truly understand them. In testing, it couldn’t be determined whether options like --Chaos, --quality, or --seed were recognized. While DALL-E might have some idea of how to interpret these options, they don’t really function as intended, and they aren’t directly supported, but still work some how anyway. The seed option doesn’t work at all because DALL-E doesn’t have this feature. “--style raw” for example not has the same effect like in MidJourney, but it seam to suppress the nonsense text a little, maybe…
Content Complexity: This is probably quite important for many, so here’s a slightly longer explanation. DALL-E processes about 256 words (specifically 256 cl100k_base tokens). Of these, roughly 30 to 40 graphical tokens can be maximally and correctly translated into a “photo style.” Beyond that, objects and colors start to degrade, objects no longer look organic, or the overall quality decreases. In general, it’s more about guiding DALL-E in the right direction rather than describing every detail exactly. It’s better to describe a comprehensive composition rather than an inventory list of details. Additionally, elaborate and poetic language seems to have little to no effect, it’s simply ignored. A simple description of the mood, like “dreamlike night atmosphere,” is enough to influence the entire scene.
One must understand a bit about how an image generator works. It doesn’t need poetry or overly ornate, aesthetically enhanced language. Simple, precise, concise instructions work best, and not too many of them. Here, GPT’s tendency for expansive, embellished language conflicts with DALL-E’s need for short, precise descriptions. There is no LLM trained specifically to write effective DALL-E prompts yet, and I haven’t been able to stop GPT’s “overly embellished rambling” so far.
For those who want to try: let GPT generate a detailed text, then reduce it to the essentials without removing graphical details. The quality of the result will likely be the same. I’ve gotten very extraordinary images with very simple descriptions, it’s more dependent on the training data and weights, and less on poetic language.
My tip at the moment is to place details where something is important, or describe something multiple times to give it more weight, for the effect to control the diffusion effect, or correct something. However, roughly describing the overall scene and leaving the rest to DALL-E has been the most efficient approach so far.

But all this is work in progress, if somebody know more or better, let us know, i will change the texts here.

Strengths of DALL-E:

Landscapes: There is a large amount of training data for landscapes, and DALL-E can generate breathtakingly beautiful landscapes, even ones that don’t exist.

API:

PHP Script: I have no experiences my self now with API, but here a super simple starter script form @PaulBellow:
Super Simple PHP / Vanilla Javascript for DALLE3 API (+ Programming Languages Debate!)

Start of a DallE session:

Since GPT does not pay attention to these memories, I begin each session with DALL-E by first entering this text, hoping that GPT will write better prompts and translations. (I do not write prompts in English.)

### Instruction for GPT for Creating DALL-E Prompts from Now On:
(This text does not require a response. From now on, follow these instructions when assisting with texts for DALL-E.)

**No Negations:**
Formulate all instructions positively to avoid the risk of unwanted elements appearing in the image.

**No Conditional Forms:**
Use clear and direct descriptions without "could," "should," or similar forms.

**No Instructions for Image Creation:**
Avoid terms like "Visualize," "Create," or other cues that might lead DALL-E to depict tools or stage settings.

**No Additional Lighting:**
Describe only the desired scene and the natural lighting conditions that should be present. Avoid artificial or inappropriate light sources.

**No Mention of "Image" or "Scene":**
Avoid these terms to prevent DALL-E from creating an image within an image or a scene on a stage. (This can be ignored, if the prompt explicitly wants a image in a image, or a scene on a stage.)

**Complete Description:**
Ensure that all desired elements are detailed and fully described so they appear correctly in the image.

**Maintain Order:**
Ensure that all desired elements retain the same order as described in the text, main element first, followed by details, technical instructions. This will also result in better file naming.

Examples:
Some phrases i often use.

Photo style.
Photo-realistic to Hyper-realistic style.

Ethereal light.
Strong contrast between light and shadow.
Sunrise during the golden hour.

Mystical magical mood.

Widescreen aspect ratio with the highest pixel resolution.

Generate only 1 image.
Image montage split into two parts: left a full-body depiction, right a portrait depiction.

ikereinez · August 7, 2024, 4:41am

Thank you for your advice. Now I understand what was going on.

Daller · August 13, 2024, 3:51pm

To the OpenAI team,

GPT needs to learn how to write better prompts, taking into account the weaknesses that lead to incorrect results, such as negations, possibility forms, and phrases like “create an image” or “a scene.” Since GPT modifies the texts before they are sent to DALL-E, this is particularly important. Instead of introducing such errors, GPT should remove them from a user’s text and use better formulations to avoid mistakes in image generation. I repeatedly receive images with nonsense elements, such as hands and brushes painting pictures, scenes on a stage where the image is just a stage set, images within a picture frame, or images inside drawing software, etc. GPT should be fully aware of DALL-E’s weaknesses and avoid them. The instructions in the memories regarding this are not being followed. (I still don’t understand the actual purpose of the memories, because GPT simply ignores them.)

Daller · August 13, 2024, 5:48pm

birdshit moon

Here a example of the always the same mood template, witch looks like birdshit.

Daller · August 17, 2024, 12:04am

“photorealistic image” vs “photo image”
I just made a discovery and would be interested to know if others have noticed something similar. Like many, I have used keywords such as ‘photorealistic image’ to achieve images that look as close to real life as possible. However, when considering how DALL-E is trained, the term ‘photorealistic’ might only appear when an artist creates an image that comes as close to realism as possible, but it is still painted or airbrushed. Metadata is also likely considered during training, including camera details. Mentioning these might trigger images captured with real cameras. So, if one wants to achieve realistic photo images, not fantasy images, it probably makes sense to use camera and lens details, as those would use such training data.

So, here’s the suggestion: instead of using ‘photorealistic image’ or ‘hyperrealistic image,’ simply use ‘photo image.’ I would be interested to know if others obtain more realistic images this way.
(I manly create fantasy images, so the results are not so clear. But if you create real life images, you should get more nonambiguous results.)

results?
“photorealistic image” leads to airbrush like art close to realism, but still created style
“photo image” really close to true realistic images

Daller · August 21, 2024, 12:25am

The results are still not clear, sadly it is not possible to use the same seed but different styles to see only the style difference with the same image. But it seams that the effect is more clear with realistic motives like a cat or dog, but with highly fantastical creatures and scenes like dragons the results can lead to more realistic skin texture, or has no effect at all, because the trainings data don’t know any real dragons but many many painted ones.

Daller · August 22, 2024, 2:03am

Here is a Test with photo and realistic style, it is really difficult to get a clear result. (Maybe the effect was more stronger in Dalle-2?) For Now i would still speculate, photo style comes most close to reality and uses a bit other training data then photorealistic style. No style could lead to it that DallE select it self a Style, and it is maybe a painting style and unpredictable.
It would be great, if we could use a Seed and always generate the exact same image, but only change the style…

But tell us your Experiences…

Prompt:
Two puppies playing together on a spring meadow. Photo style. (text use unchanged, generate 1 image)

Non

Photo style

Photorealistic

Hyper-Realistic

Realistic

polepole · August 22, 2024, 6:27am

The moon comes with many black points as you showed recently.
I found a way using word “PEARL” instead “MOON”.
Of course it is not perfect, but at least it is “better than nothing”

Daller · August 22, 2024, 4:35pm

Surely better then the template Moon, tanks!

(Has OpenAI reduced the dataset? It is strange that it is always the same, almost no variations.)

Daller · August 22, 2024, 5:25pm

Here is a example how DallE scatters informations over the Scene. It leads often to a harmonious result, but sometimes it is difficult or impossible to create a specific precise look. Colors and objects are not only where you want to have them.

The pictures look good, but maybe not exactly how you want them. (or it needs a better elaborated prompt.)

Prompts:
A natural environment where everything without exception is made entirely of a dazzling white material. A jungle with dazzling white trees, dazzling white plants, dazzling white rocks, and a dazzling white ground. Photo style. Landscape format with the highest pixel resolution.

A natural environment where everything without exception is made entirely of a dazzling white material. A jungle with dazzling white trees, dazzling white plants, dazzling white rocks, and a dazzling white ground. One exception, a unicorn with a horn made of violet crystal. Photo style. Landscape format with the highest pixel resolution.

A natural environment where everything without exception is made entirely of a dazzling white material. A jungle with dazzling white trees, dazzling white plants, dazzling white rocks, and a dazzling white ground. One exception, a pitch-black unicorn. Photo style. Landscape format with the highest pixel resolution.

Pure White

Violet crystal not only (actually not at all) on the unicorn

Black not only the unicorn

Daller · August 22, 2024, 6:13pm

Here a image including nonsense text. If DallE don’t know how to realize a image, it starts describing it with prompt fragments. (Where DallE gets the idea from to do this??)
(And the birdshit moon again.)

Daller · August 22, 2024, 6:29pm

Here a collection of the “mouthy” template, a almost silicon plastic locking face or facial-part. I would guess for some image elements DallE was “over trained” to make sure it generates a esthetic result. But this templates can be very blocking and create always the same stereotype where it not belong to be. Sometimes it is possible to get over it, but it costs time and pictures on the count, and ruins some otherwise good results.

polepole · August 22, 2024, 7:31pm

A pristine, photo-realistic landscape where every element is pure, dazzling white with absolutely no other colors or shades. The scene is lit as if the sun is directly overhead, resulting in no shadows. This surreal jungle features only white trees, white plants, white rocks, and a white ground, all in perfect symmetry and harmony. The environment exudes a sense of pure, untainted whiteness, with every element evenly spaced and perfectly aligned. The photograph is captured with a Canon EOS R5, 45MP camera, using a 24-70mm f/2.8 lens at f/8 to ensure the highest level of detail and depth of field. The composition is in landscape format with an aspect ratio of 16:9. Stylization is set to 500 for a refined artistic touch, while maintaining strict photorealism. The chaos parameter is set to 30, ensuring subtle variations within the all-white environment without introducing any gray or other tones.

A photorealistic wide panoramic image of a unicorn standing in an epic pose within a pristine, pure white forest. The entire scene, including the unicorn, trees, ground, and sky, is completely white, creating an ethereal, dreamlike atmosphere. The unicorn's horn is made of violet crystal, which catches the light and glows subtly against the monochromatic background. The forest features a variety of white plants, including tall, delicate white trees with intricate branches, white ferns, and scattered white flowers. The ground is covered in a smooth, white surface with different textures from the various plants. The violet crystal horn adds a striking contrast to the surreal scene.

A photo-realistic landscape capturing a natural environment where everything is made entirely of dazzling white material. The scene features a jungle with white trees, white plants, white rocks, and a white ground. The only exception is a pitch-black unicorn standing in the center of the scene. The unicorn is completely black, contrasting sharply against the pure white surroundings. The image is captured with a Canon EOS R5, 45MP camera, using a 24-70mm f/2.8 lens at f/8 to ensure the highest level of detail and depth of field. The scene is in landscape format with an aspect ratio of 16:9. The style is refined with subtle variations in the white environment, while the unicorn remains the only black element.

A highly detailed, split-scene image depicting a dramatic contrast between two worlds. On the left side, a bleak landscape under a faintly glowing pearl in the sky in the top left corner, filled with ruins, dead trees, and a rusted, decaying car. The ground is covered in ash and debris, representing despair and destruction. On the right side, a sunny, vibrant, lush landscape full of life and color, with blooming flowers, glowing orbs, and a radiant sun shining over a rejuvenated, futuristic city. In the center, a glowing, ethereal humanoid figure walks along the boundary between these two worlds. Everything the figure touches heals, bringing light, color, and life, transforming the barren wasteland into a thriving, sunny environment. The transformation is visually clear, with a transition from dark and lifeless to bright and flourishing as the figure’s influence spreads. The scene is captured in a photorealistic style, as if taken with a Canon EOS R5 with a 45MP sensor and a 24-70mm f/2.8 lens at f/8, ensuring sharpness and depth of field. The image is in landscape format, with the highest pixel resolution.

Daller · August 22, 2024, 8:08pm

What I may have noticed is that when multiple elements are described, DALL-E tends to respond a bit more precisely. However, when only the absolutely necessary characteristics are described, the scattering effect is stronger. I intentionally kept the prompt very simple with as few details as possible to trigger the effect. For example, in your image with the black unicorn, the black color scattered into the sky this time. It’s best to use a few select elements, such as a completely white background, strong contrast with something black, a colored crystal, and a red butterfly. This way, the scattering effect can be best observed. I’m still not sure if the technical Information’s help to suppress some unwanted behavior, i am still testing…

The image with the two divided worlds originally had a completely different prompt (unfortunately, I no longer have it). The result didn’t actually capture the essence of the message, because it was intentionally without much details, which also resulted in the text appearing in the image. The prompt you used describes the scene quite well, which is why no text appears. If you want to try provoking DALL-E into adding text, give instructions that are difficult for DALL-E to interpret (e.g., an extremely, very, very extraordinary plant that has never existed… etc.). Eventually, DALL-E will add nonsense text when the system no longer knows how to visually implement the requirements. And if you add like “Create a Scene” or “Close to the Camera” you can trigger more unwanted behavior.

Daller · August 22, 2024, 8:26pm

Something more…
GPT modifies the prompts that were entered before they are sent to DALL-E. It might be useful to see what DALL-E actually used to create the image. If you want to see the actual text that was used, you can either instruct GPT to display it or you can highlight the text along with the image and copy it to the clipboard. In my case, both the entered text and the text sent to DALL-E are then in the clipboard.

Daller · September 22, 2024, 6:03pm

Here a example where the backlight template makes no sens. The fire should be the only light source in front of a pitch black background. But the in this case nonsensical illumination could not be switched of.

polepole · September 25, 2024, 11:06pm

Daller · September 26, 2024, 5:28am

Cool! It seems you have to repeat it over and over and over again until the light goes out. I guess I’m not patient enough. How many tries did it take until it worked?

_j · September 26, 2024, 5:45am

Yep, this is a challenge. Dall-e is pretty insistent that there is some unseen light source in all generations…

Redescribed for dalle just by looking at what you got so far.

Daller · September 26, 2024, 6:19am

Yes, i used polepole’s prompt but i still get the nonsense light…
If you need 50 pictures to get one right, the problem is not really solved.

I think the insistency comes by a over training or a general advice to generate usually good pictures, but this sometimes stays very nerving in the way. I call this behavior here “templates”. (The “mouthy” template is what goes the most on my nerves for now. It blocks almost any non-human creature creation.)

Topic		Replies	Views
Bug Report: Image Generation Blocked Due to Content Policy Prompting dall-e-3	33	623	October 13, 2024
Safety feature is a bit over the top Community gpt-4	32	136	October 13, 2024
Official DALL-E Gallery: October 2024 (Halloween Edition) Community chatgpt , dalle3	49	571	October 21, 2024
Best prompt for generating precise TEXT on DALL-E 3 Prompting gpt-4 , image-generation , dall-e , dall-e-3 , dalle3	47	107192	February 19, 2024
DALLE3 Prompt Tips and Tricks Thread Prompting dalle3	110	77376	October 6, 2024

Collection of Dall-E 3 prompting bugs, issues and tips

Related Topics