Collection of Dall-E 3 prompting bugs, issues and tips

Daller · October 9, 2024, 7:35pm

OK interesting, i have never see this for now.
I only give thumb down and a message why.
Maybe they can detect nightshade now?

Keep in mind, GPT not know about DallEs weaknesses. it maybe creates a prompt with “add no text” and you will get 100% text then. I always check the used prompt to see what was wrong.

GPT still put snow-white in the prompts, added in translation, and this is a nonsense policy block trigger. And negations too.

I struggle now to create a prompt writer to create prompts without errors, but it keep on doing what was explicitly explained not to do. The LLM needs a update too.

Daller · October 9, 2024, 9:13pm

…In your prompt you write “create”, this triggers the adding of creation tools and processes.

In the past it returned similar text, but only reflect your input, without real knowledge what is in the image, what has a gaslighting-effect. You can try tricking the generator and say, “why is there text”, when actually is not. And see if it follow your suggestion, instead the reality.

I think the LLMs are generally guided to reflect the user, and only contradict little. (can be dangerous)

Daller · October 9, 2024, 11:40pm

It seams that the graphic-token limit is around 40. That means token witch have a content whit a graphical meaning-full effect. Then objects or colors disappear. This is now only tested whit “photo stile” images with highly complex and esthetic content. Including: see cliffs mountains animals plants colors materials etc.

But this is not totally clear. I will add a text if others have infos for this. And can double proof it. It is literally “diffuse”

mitchell_d00 · October 10, 2024, 11:48am

This thread is great it goes well with @PaulBellow ’s gallery thread.

phyde1001 · October 11, 2024, 8:23am

This upsets me…

This looks face beatification filtered…

They should really weed that out ^^

This is definitely something I will try. I had the same problem.

ChatGPT: ‘camera settings for REAL image’

For capturing realistic images, camera settings vary depending on the subject, lighting conditions, and desired effects. Here’s a general guide to achieve high-quality, realistic photos:

1. Exposure Settings
Shutter Speed: 1/100 to 1/250 seconds (for stationary subjects), 1/500 or faster for motion.
Aperture (f-stop): f/5.6 to f/11 for a balanced depth of field; f/2.8 for a shallow depth of field (e.g., portraits).
ISO: 100-400 for outdoor/daylight; 400-800 for indoors/low light. Higher ISO may introduce noise but can be compensated with post-processing.
2. White Balance
Auto for most conditions.
Daylight (5500K) for sunny outdoor shots.
.... etc

Daller · October 11, 2024, 9:25am

Tanks. I wished we could use seeds, and then see what effect each change in text has exactly. I spend so much time to make sense of X amount of different pictures…

phyde1001 · October 11, 2024, 9:57am

We noticed that a bunch of things in images follow the same ‘lines’

SameLines

The flower in the bottom left, heads same place, trees left and right same place, waterfalls could merge together, right wing is kind of in the same place as other creature…

haha this is kinda like spot the difference

BPS_Software · October 11, 2024, 1:11pm

You can (mostly) accomplish this with gen-id rather than seed. The images won’t be identical, but you can achieve much less variation. Once you generate an image request the gen-id from the model. Use the gen-id in the next prompt to the model, along with the exact same prompt from the previous generation, and change only the parts of the prompt you want to iterate on. Example:

Prompt

Two small puppies playing gleefully in a vibrant field of sunflowers. The puppies are running and chasing each other, their fur soft and fluffy, with big joyful eyes. The sunflowers stand tall around them, golden petals glowing in the sunlight, swaying gently in the breeze. The sky is bright blue with soft white clouds, and the overall atmosphere is one of pure happiness and playful energy in nature.

Prompt

let’s make an adjustment. Create a new image with the below gen-id and prompt.

Gen-id: 4bp3lKYSJ5lbF3Ni
Prompt: Two small puppies playing gleefully in a vibrant field of roses. The puppies are running and chasing each other, their fur soft and fluffy, with big joyful eyes. The roses stand tall around them, red petals glowing in the sunlight, swaying gently in the breeze. The sky is bright blue with soft white clouds, and the overall atmosphere is one of pure happiness and playful energy in nature.

Prompt

Two small puppies playing gleefully in a vibrant field of tennis balls. The puppies are running and chasing each other, their fur soft and fluffy, with big joyful eyes. The tennis balls bounce around them, glowing in the sunlight. The sky is bright blue with soft white clouds, and the overall atmosphere is one of pure happiness and playful energy in nature.

Prompt

let’s make an adjustment. Create a new image with the below gen-id and prompt.

Gen-id: 8CLFuV4piGqw99mp
Prompt: Two small puppies playing gleefully in a vibrant field of marshmallows. The puppies are running and chasing each other, their fur soft and fluffy, with big joyful eyes. The marshmallows bounce around them, glowing in the sunlight. The sky is bright blue with soft white clouds, and the overall atmosphere is one of pure happiness and playful energy in nature.

Daller · October 11, 2024, 2:04pm

Yes, i detected this long time ago. this effect is specially strong if you create 2 pictures at once. Certain areas show clearly similar structures. It is like maybe stable diffusion reuses a state of construction a second time to speed up the process. The effect is less strong if you create only 1 image at once.

They include watermarks to detect AI images, maybe this is one of the side-effects, watermark pixels set in the diffusion process could guide the result.

Daller · October 11, 2024, 2:12pm

I tested this a time ago, but without effect. I used gen-id and seed and the same prompt. It should produce almost the same picture, but it does not, at least not in the past.

But yes, there is a stronger similarity, but difficult to tell what caused it.

In case here is somebody using MJ or a other system whit seed support, could tell us if it is possible, for example to change a color without too much changes in the image.

Daller · October 11, 2024, 3:12pm

Here is a example of structure similarity. I don’t know if FFMPEG or a other tool can detect structure similarity’s, but a simple “difference” is not efficient enough to show it clearly.

But you can see that very white spots stay stable over images. And this could be the watermarking or the diffusion process uses the same noise. I actually think it is the watermarks, because different sees are used for the images. But i don’t know all the details of the diffusion process, maybe more noise layers is possible too, and re-usage of a state in process.

a12

Daller · October 11, 2024, 3:31pm

I have found a way to implement AutoSave. I lost some images because I didn’t download them in time. They don’t stay available on the server for long.

However, I hesitate to make the script available here, it might bring security issues, and I am not an expert. I just discovered the tool right now.

But for those who want to create a solution, there is a plugin that can achieve this. It’s called Tampermonkey. It can run scripts on a webpage, for example, scanning the DOM, finding new images, and saving them automatically.

chieffy99 · October 11, 2024, 5:38pm

You’re on the right track and explains it very well. I might be able to add more details for you.

**Most of these comments are about controlling the prompt sent to DALL-E according to every character input, significantly reducing the factors of encountered issues.

Bugs:

Nonsensical Text Insertion: When pushing DALL-E to creative limits, nonsensical texts suddenly appear, where DALL-E inserts the prompt into the image, probably to describe it. This has been the strangest behavior so far. You cannot get rid of it with “don’t add any text,” on the contrary, you get more text. You have to change the prompt itself. It seems DALL-E starts describing the image instead of trying to find a graphic solution if it has no idea how to realize it. So the more creative challenging the realization is, the more likely you get nonsense text in the image. Some styles are probably more susceptible to unwanted text because text is often included in these images during training. For example, in “drawing” or “cartoon style”. (Very tiresome and frustrating sometimes!)

In this case, I see it as creating an interactive element that informs us of the problem, we just need to interpret it correctly. I consider it a parameter, let’s call it ‘Chaos.’ The effects range from blurry lines or distorted straight lines.

5769c00c-5b06-4167-8a91-15a016dc8cec1024×1792 264 KB

When compared with images without Chaos, even complex images but still in sharp and clearly detailed.
b44e3252-aedf-483f-aaeb-e7d413427be7 - Copy1024×1024 389 KB

It can also create two images in low quality like to ask you to choose. More details can be explained later.

Image Orientation Issues: DALL-E has issues orienting images correctly if they are in portrait/vertical mode. It sometimes creates a horizontal image but turns simply the image wrong, or it creates a square and fills up the rest with nonsense. It seems some people could overcome it by using a directive like “rotate 90°,” it is not stable.

In this case, I disregard the image size command issue, which may stem from a misunderstanding between the user and ChatGPT. This confusion is similar to the above, but it doesn’t necessarily arise solely from mistakes or misunderstandings. Conflicts between the image to be generated and the system also play a part, such as creating an image of a woman in a sexy outfit in vertical orientation or requesting a full-body view. DALL-E would only generate the front portion in widescreen format, rotated 90 degrees.
flip1024×1792 50.6 KB

Geometric Understanding: Geometries are not yet fully understood. For example, a snake sometimes simply consists of a closed ring. Fingers are better now but still can have mistakes. The system is still not perfect…
-In this part, I found that there aren’t many issues, and the model does understand direction and spatial dimensions, but it hasn’t been developed to its full potential (information from February, reported to the help center and give problem to Sora. Because there is same tool in language processing process.

Lack of Metadata: Not a bug in this sense, but kind of… DALL-E created files do not include any meaningful metadata. The prompt, seed, date, or any other info is not included in the files. So you must save the prompts manually if you want to keep them. (I have now spent several hours UNSUCCESSFULLY trying to add metadata that is missing in the WEBP formats. WEBP is absolute garbage.)

I have no issues with this point, and I already have my own way of managing the data. However, I have some recommendations that will make your life easier, which I will include with related points.

Content Policy Issues: The content-policy security system of DALL-E makes not much sense, and gives no feedback, it blocks sometimes absolutely okay texts. I have another post for this. (Bug Report: Image Generation Blocked Due to Content Policy 10)

It is important to distinguish the cause; the decision-maker is DALL-E. If ChatGPT refuses to create something, I always give this reason because it often creates images but uses other factors to replace them, similar to the first two methods.

Issues and weaknesses:

Here are some tips on how to bypass some weaknesses that DALL-E still has.
It is also interesting to know that even GPT does not recognize some of these weaknesses and generates prompts for DALL-E that need to be improved.
Negation Handling: DALL-E cannot process negations, what you have in the text mostly ends up in the picture. DALL-E does not understand “not, no, don’t, without” So always describe positive desired properties to prevent DALL-E from getting the idea of adding something unwanted.

Initially, I thought so, but recently I have noticed some responses. It is possible that the interpretation has changed.

Avoid Possibility Forms: It is also good to avoid possibility forms like “should” or “could” and instead directly describe what you want to have in the image.

Prompt Accuracy: DALL-E takes everything in the prompt and tries to implement it, even if it doesn’t make sense or is contradictory. The more complex the realization, the more likely errors are. For example, “close to the camera” or “close to the viewer” resulted in DALL-E adding a camera or a hand to the image, instead of placing the desired element close to the viewpoint. So far, “close to us” has worked.
Also the instruction “create” or “visualize an image” sometimes leads to DALL-E adding brushes and drawing tools, even with a hand that literally creates the image. A phrase like “An Image…” or “A Scene…” sometimes leads to DALL-E literally creating a image in the image or a scene on a stage in a theater.
Just describe the image itself and avoid instructing DALL-E to “create/generate/visualize the image” or “a image / a scene / a setting …”.
Instead to say “The Scene is…” if you want a overall effect, say “All is…”.- These three topics I consider as one, but they must be separated from the reason that “takes everything in the prompt.” You observed this very well, but DALL-E does not process all the words.

These issues arise from the model’s interpretation and some external factors. Words like should, could, or similar terms add chaos; the model can choose to do everything or nothing. It also happens due to other words that can be interpreted in different meanings, sentence segmentation, and phrasing. This is crucial, and most people don’t realize it because they think in human language with a single interpretation, unconsciously. Even though LLM thinks in vectors to find the next weighted word to create a suitable image, it doesn’t work the same way with DALL-E. I found this because I am not proficient in English; if I am confused when translating, DALL-E would also be confused due to misinterpretation. Personally, I think this is a characteristic of the model: if ChatGPT tends to be agreeable but deceptive, DALL-E is the opposite—resistant but straightforward.

There are also a few external factors I mentioned earlier that affect other parts without their functional intentions, and It affects the so-called templates you mentioned below.
I’m going to take a break. It’s possible that I might borrow some of the content in this story as an example of a solution that can be solved in making my content. And I will explain everything no matter how much it contradicts what many people believe, because my knowledge is not based on basic knowledge or general research.

chieffy99 · October 11, 2024, 5:49pm

How are you. I come back because some problem. Some AI give information about OpenAI model by refer data from this community.
Even though the information may be old and led to misunderstanding when use in present time. No matter how I argue, it still refers to the content. I think that could be a big deal.

_j · October 11, 2024, 6:58pm

I tried to break all the image guidance advice, sending what you state in multiple blocks, and got orange texted and denied.

Ugly outfits are okay to send.

Real obvious how the center square is independent of the sides, and the “full-body” was enough for ChatGPT but not for DALL-E.

Daller · October 11, 2024, 7:05pm

Yes very much, this is the reason for many issues happen after the translation. Including blocking (snow white). I now add in general “Don’t change the text, only translate it.”, it helped me a lot.
I tried right now to make a “My GPT” to let GPT create better prompts, without success. GPT is very obstinate to obey a detailed advice list, including many example. GPT keeps on to “swagger” around. I have give up for now the work on it. I just tell GPT to not mess around with my text.

If you have improvements witch work, this post is made for exactly this. But it needs advice’s people can understand without deep AI knowledge.

In the moment i could not create a advice list for a self made GPT witch fixes the most of the problems, otherwise i would have published it here.

The nonsens text is not about “blurry lines or distorted straight lines” DallE literally puts gibberish text prompt fragments in the image. I speculate, the text it in the tokens DallE uses, if it not finds a graphic interpretation, the system uses the text, and place it there where the graphic element should be. just now it created many pictures in this way, so this problem is not fixed at all. I speculate, the developers should delete the text in the token structure or remove it all together, if it not finds a representation, and so it would not show up as text anymore.

Daller · October 11, 2024, 7:13pm

Yes, this is what is absolutely nerve destroying. Actually it is not DallE, but a “security” system in front of DallE, witch simply uses a stupid block-list, and not report anything back to GPT or the user. GPT it self creates blocked prompts. I think this nonsense in now in place since DallE-3.

Over all, i think everybody here active knows, that GPT and DallE are 2 separated system. And in the moment they not work well together.

Daller · October 11, 2024, 11:56pm

Here what happens sometimes it you advice to “create” a image. Brushes and other tools show up literally creating the image.
It not happens always, only if it picks this type of data, based on the random seed. Some styles are more susceptible to trigger it, because it is more in the training data, like drawings and child pictures.

And… the “Mouthy” of course.

I believe this was the prompt after translation.

Create an elegant and intelligent humanoid creature that belongs to the species of trees. It has a lively, awake face, and in its eyes, one can see its wisdom. It has wild, shaggy leaves on its head, and moss grows on its shoulders. The creature looks as if it emerged from a tree and became a humanoid. The scene is in a widescreen aspect ratio with the highest pixel resolution. Only one image.

This from @polepole has worked 100% without mouthy

An elegant and intelligent humanoid creature belonging to a species of trees, with a face entirely made from wood, full of wisdom and ancient knowledge reflected in its eyes. Instead of wild, shaggy leaves, there are only small traces of plant life around its head, with moss still growing across its shoulders. The creature has bark-like skin and wooden limbs, appearing to have emerged directly from a tree. It walks upright in a humanoid posture. The background is a lush, enchanted forest, softly illuminated by mystical light, with the creature in full detail in a widescreen aspect ratio.

Here now a reduced text, like my newest discovery, DallE not needs poems, only precise data.

Instead of wild, shaggy leaves This is a negation and had not worked in my test, i got very hairy creatures with big beard. I changed it to only positive descriptions. and the reduced text returned the same results.

Reduced Prompt, same result

An elegant and intelligent humanoid creature belonging to a species of trees, with a face entirely made from wood, full of wisdom and ancient knowledge in its eyes. Traces of plant life around its head, with a bare face. Moss grows over its shoulders. The creature has bark-like skin and wooden limbs, walking upright in a humanoid posture. The background is a lush, enchanted forest, softly illuminated by mystical light.

polepole · October 12, 2024, 12:02am

If you don’t mind can you share what prompt did you use? @Daller

Daller · October 12, 2024, 12:08am

Sadly i have lost the prompts for the early ones, i could not imagine that the data is not stored in the meta data in the picture… i can search it but will take a while…

Topic		Replies	Views	Activity
A Study on Using JSON for DallE Inputs Community dalle3	27	350	October 25, 2024
Bug Report: Image Generation Blocked Due to Content Policy Prompting dall-e-3	33	888	October 13, 2024

Collection of Dall-E 3 prompting bugs, issues and tips

Related topics