Collection of Dall-E 3 prompting tips, issues and bugs

What is the smallest change in the prompt that will result in the most changes in the image?

I would expect it to highly depend on the image
 Like white on black would make a high entropy change.

It may be that the smallest changes would be from the least used tokens or something though I guess those are relatively well balanced.

At least if my understanding is right.

Maybe highest change ends up being a dense character because of the text echo ‘#’ rather than ‘.’

All this is guesses but might be worth testing.

1 Like

You got it pretty close.
If we can use such a small number of characters to change the image, does it still matter if it is a seed or noise?
Try creating an image of “English letters representing animals whose names start with the letter A”. When you create B C D
 you will see that the meaning of the image changes significantly.

Every character, punctuation mark, and space can be either a seed or noise depending on the environment.

1 Like

DallE can sometimes not put simply a object at the correct place, but it can do this

It has a concept of infinite regression and mirrors.


 i always have to search the english version of the prompts


A baroque mirror with infinite varied reflections. Photo style, high fidelity.

3 Likes

mirrored chrome sphere on beach reflecting the sunset and another nearby mirror

I remember doing a lot of these
 mirrors across from each other, etc


4 Likes

I did one yesterday for smaller things I made a ton

3 Likes

@mitchell_d00 @PaulBellow 
 that DallE gets the concept right without ray-tracing in amazing


3 Likes

Yeah, super impressive!

1 Like

I guess you could try less used characters like ~ or something though this might make wavy lines in picture :D


It might be a question of doing a lot of testing but you may see patterns as you go

1 Like

While this isn’t a solution I have been playing with Macros and have the same ‘echo’/‘reverb’ effect with Macros inside Macros ie:

Note this is ‘ChainOfThought’, see the phrase is picked up

Image(Wide, ChainOfThought(From Time To Time))

A visually captivating widescreen image representing the concept of ‘Chain of Thought over Time’. The image features a flowing, abstract chain composed of glowing orbs connected by delicate filaments, each orb representing a different point in time. The background transitions smoothly from warm golden hues on one side, symbolizing the past, to cooler blue tones on the other, representing the future. The chain weaves through the scene like a spiral galaxy, with faint constellations and ethereal clouds hinting at the vastness of ideas and progression. The setting conveys a sense of timelessness and intellectual journey.

Note this is ‘CoT’, see the phrase is NOT picked up

Image(Wide, CoT(From Time To Time))

A wide cinematic landscape illustrating the concept of ‘From Time to Time.’ The scene depicts a serene countryside that subtly transitions into an otherworldly landscape, blending temporal elements such as a futuristic city skyline fading into ancient ruins in the distance. The sky is ethereal, transitioning from a golden sunset to a starry twilight. A gentle path weaves through the scene, symbolizing the passage of time. The overall mood is contemplative and awe-inspiring.

Image(Wide, CoT(There and Back Again))

This may be due to size
or ‘Chain of Thought’ more recognisable
or maybe because it is evaluated from CoT to Chain of Thought in a pre-processing stage?..

‘Chain of Thought’ is also considered more on some prompts than others suggesting that the ‘weight’ of some parameters is more than others and outweighs ‘Chain of Thought’

I think I have now removed this artifact issue from my macros, at least reduced this problem

Again, all guesses and based on basic testing but over multiple use cases not just this one.

In fact even more testing and finding that:
‘From Time To Time’ and ‘Chain of Thought’ seem to be more closely related than ‘There and back again’ and ‘Chain of Thought’

Maybe the reverb is strongest where the weights are closest?

Image(Wide, ChainOfThought(Derailment))

Maybe reverb the wrong word :smiley:. Definately ‘CoT’ reduces the issue a lot.


OK I have done a bunch more testing on GPTs and this is what I understand so far, this doesn’t fix the Dall-e prompt but it does give some context:

When you send a prompt to DALL-e I assume it’s the same atm as to 4o in terms of scope.

It seems to me that anything you write on the main prompt will be recognised.

The macros I have on Phasm GPT are a cheap implementation but they work and allow some basic recursion.

The system message has a precedence at least so by creating a macro inside the system message of the gpt you shouldn’t see the reverb issue ie seeing ‘Chain of Thought’ in the image creation response:

ChainofThought(“Step1”, “Step2”, “Step3”) {
Perform sequential reasoning:
1. Define a clear objective or problem.
2. Break it into smaller, manageable subtasks.
3. Solve each subtask sequentially or recursively.
4. Combine results for the final output, avoid referencing CoT directly
5. Return As List with no comments or description
}

Note: No mention of ‘Chain of Thought’ in the descriptions in Phasm. These prompts wont work well on ‘base’ ChatGPT ^^.

Note: The point here is the generated responses don’t contain ‘Chain of Thought’ on a one shot prompt.

Image(Wide, ChainofThought(Dreaming Of Thinking))

A surreal depiction of a human silhouette surrounded by swirling, abstract patterns symbolizing thoughts and dreams. The silhouette is immersed in a cosmic background with stars and ethereal light, representing the interplay of thinking and dreaming. The image captures a mix of tranquility and mystery, featuring a wide view that stretches to encompass a dreamlike landscape. The patterns suggest neural pathways and dreamlike symbols flowing into the infinite cosmos.

A surreal wide landscape depicting the concept of ‘Dreaming of Thinking,’ where a vast, ethereal dreamscape features floating abstract brain-like structures blending into clouds. The horizon merges with shimmering waves of thought, represented by glowing neural patterns extending into infinity. Distant, mysterious figures sit under towering bookshelves, lost in contemplation. The scene evokes creativity, imagination, and introspection.


Image(Wide, ChainofThought(Thinking Of Dreaming))

A surreal, dreamlike landscape illustrating the concept of thinking about dreaming. The image features a wide perspective with layers of abstract thought, represented by floating, glowing orbs connected by wispy trails of light. These orbs hover above a serene, ethereal environment with soft clouds, shimmering stars, and faint outlines of fantastical structures blending into a dreamlike horizon. The colors are a soothing mix of pastels and muted hues to evoke calmness and introspection. The setting feels limitless and imaginative, embodying the journey of introspective thought.

A surreal and expansive scene that captures the essence of thinking and dreaming. The landscape is a mix of abstract and natural elements with a dreamlike quality, featuring floating islands, ethereal clouds, and a figure contemplating at the edge of a cliff. The figure’s thoughts materialize as luminous orbs and swirling patterns that blend into the sky, evoking an otherworldly, introspective atmosphere. The horizon stretches infinitely, blending vibrant and soft pastel colors. The scene feels like a bridge between reality and imagination.


2 Likes

It seems that DALL-E (or recap) was unable to distinguish a bird-of-paradise from a plant during training.

2 Likes

Prompt: AAAddsAndAtmosphereBeautyCascadesCreatingEnvelopsForestGracefullyImmerseInInvitingJaggedLushMistNature.OfOfOverRocksSereneSoundSurroundingThatTheTheTheTheTheThemselvesToToTranquilViewersWaterWaterfallWhich

Sorry, I try to the deep of prompt.
Everyone has their own beliefs and methods.

I believe in what I do. If writing in a non-normal language can create images that match the original meaning, then why is writing in human language a problem?

This is a prompt developed from studying the behavior of DALLE3 without paying attention to the text in terms of language.

This is a prompt that confirms that most people do not know the model well enough.

Every image I edited the prompt, I didn’t encounter the same problems that others had, except for the templates, and even the direction problem that OpenAI can’t fix, I found the key factor.

DALLE3 will soon be able to generate images that are orientated correctly as they should be.

3 Likes

It’s difficult to say exactly what is happening, as systems like DALL·E and others remain something of a mystery, even to a degree to the developers i think.

I’ve experimented with random arrangements or extremely short prompts as well. When a scene does not rely on representing a specific story or context, DALL·E simply takes all the words and translates them graphically, regardless of their order. It don’t need a text with correct meaning.
The realization that DALL·E does not require elaborate and complex descriptions, but that short, concise texts lead to the same result, partly stems from this insight.
It is easy, for example, to generate a beautiful landscape using randomly arranged words, but a scene from a story with details requires at least an assignment of attributes.
It is very possible that DALL·E simply ignores all filler words and only uses those that can be graphically rendered. Additionally, the tokenizer might separate words “WrittenLikeThis” and can maybe even separate words like “writtenlikethis” to a degree, if a tool for it exists.
(text correctors, I WRITE LOT OF ERRORS!! the corrector is very strong and useful, i hope i get soon a offline automatic text corrector tool.).
(“writtenlikethis” happens often on the phone, and so maybe? a corrector for this exists)

When objects are in context, the system needs the order and separation of the elements, but it doesn’t interpret them with complete accuracy. The results are somewhat correct but also significantly wrong, better than random word arrangements but not precise enough. DALL·E 3 isn’t entirely accurate in this regard.
Even the scattering effect hinds a lot of context un-precision.

You can also get very beautiful images from completely nonsensical text, as long as the mentioned words can be rendered graphically. (However, I often don’t like the results because I prefer atmospheric images that tell a story.)

(What i have seen too is, that GPT often not writes a Prompt with a good result, i am still better in prompt writing then GPT for story scenes.)

2 Likes

Question: What do you think caused the misalignment? How will they fix this?

For example, I had to correct this candle with the instruction “rotate 90°,” as it was consistently wrong.

1 Like

What OpenAI should do is train a GPT specially:

  1. Create efficient DallE prompts. There has to be a “DallE Prompt Text Style” in GPT.
    GPT is really bad at prompting, if you know all the here descried understandings. If you only want a cool image you don’t notice it, but you do if you try to be exact and precise. GPT even causes blocking! so ridiculous dysfunctional the block system is, and there is no tool for GPT to check if and what triggers blocking.
    I tried to create a MyGPT for this, and it not work! 8000 Characters and 5 or 6 advice to “NOT USE A SCENE OR SETTING”, a advice to check the prompt before sending it if a “Scene” is in the text, and GPT still puts me in almost every prompt a “Scene” description. Not only DallE hast Template effects, GPT has it too, and it is very very goatish and not follows the advice. It was not possible to put the knowledge here in a public MyGPT, it is to unreliable.

  2. Create efficient advice for GPT is self.
    I not know how much time i wasted to put advice’s together, and change them again and again and again, until they at least work to a degree. GPT is not able to creates its own advice’s efficiently. I think i never used an advice exactly like GPT generated it, i always had to correct it or write it entirely myself. GPT should be able to formulate advice’s correctly to it self.

1 Like

You mean these prompts?

A serene waterfall ASereneWaterfall, which cascades gracefully over jagged rocks WhichCascadesGracefullyOverJaggedRocks, creating a mist that envelops the surrounding lush forest CreatingAMistThatEnvelopsTheSurroundingLushForest, and the sound of the water adds to the tranquil atmosphere AndTheSoundOfTheWaterAddsToTheTranquilAtmosphere, inviting viewers to immerse themselves in the beauty of nature InvitingViewersToImmerseThemselvesInTheBeautyOfNature.

Final prompt

A serene waterfall, which cascades gracefully over jagged rocks, creating a mist that envelops the surrounding lush forest, and the sound of the water adds to the tranquil atmosphere, inviting viewers to immerse themselves in the beauty of nature.

It’s just understanding the interpretation of the model. I don’t create these prompts with any linguistic meaning at all. This way, it gets me closer to the template problem faster.

If you compare it to other people who create images that still have text or unwanted things appearing in the images, but at the same time, my nonsense text creates similar images, even though the text of the prompt is very different. What you should do is ask about the differences in the methods, not judge that what you see is meaningless. What you see is not all that I did.

1 Like

Not sure if i misunderstand, i not saied your prompt is meaningless. the result “A serene waterfall, 
” is how actually i write most of my prompts. This sentence has meaning, and i like this prompts style more, then a very abstract extreme short one.
I say, DallE can even create of text witch is not written in a normal way, or whit scrambled words, a beautiful picture. As long you have Words in it witch lead to a good result.
Specially landscapes are very tolerant for this method!

Important for texts i write is always to keep in the mind: I always speak about the prompt DallE get and process, NOT the prompt you enter in GPT. GPT expands and alters prompts, before they are sent to DallE. i use ALLWAYS (Don’t change the prompt) in my work.

I think Text or low quality in a image comes from:
Text in image: The model not know how to realize the image. The text is to complex, long, vague, contradictory or non-visual (like a smell or sound). I think the model inserts the text there where it would place a graphic information, it has not determined any, so it uses this text fragment instead a graphic data.
Low Quality: Too many information and details, the model falls down to a pore primitive state, or it mixes so many graphic information’s that they blur. It is like mixing too many colors and you get shit-brown or gray.


but here what i mean:

I reduced your prompt to the minimum. It is no longer a fluent sentence but simply a list of things I want (a “meaningless” sentence, you wouldn’t find this in a book). but still in the right order.
The results are not “cherry picked”, it is the 1 picture i got.

Text Input: (MyGPT deletes all in (Brackets) )
Waterfall, jagged rocks, mist, lush forest, tranquil atmosphere, pure beauty nature. (Querformat mit höchster Pixelauflösung.) (1 Bild erzeugen.) (Prompt nicht verĂ€ndern, nur englisch ĂŒbersetzen.)

DallE Sent:

{
  "prompt": "Waterfall, jagged rocks, mist, lush forest, tranquil atmosphere, pure beauty nature.",
  "size": "1792x1024",
  "n": 1
}

Result:

Now i scramble randomly every word, a “meaningless” text:
(I like this result actually even a bit more. i made only 1 pic each. But the connection " jagged rocks" maybe is lost now. And but, if you only want a beautiful picture, you will not notice that the “rocks” are not “jagged” anymore.)

Text Input: (MyGPT deletes all in (Brackets) )
nature jagged tranquil beauty Waterfall pure lush mist rocks atmosphere forest. (Querformat mit höchster Pixelauflösung.) (1 Bild erzeugen.) (Prompt nicht verĂ€ndern, nur englisch ĂŒbersetzen.)

DallE Sent:

{
  "prompt": "nature jagged tranquil beauty Waterfall pure lush mist rocks atmosphere forest.",
  "size": "1792x1024",
  "n": 1
}

1 Like

I don’t understand why you do this (bolt):
AAAddsAndAtmosphereBeautyCascadesCreatingEnvelopsForestGracefullyImmerseInInvitingJaggedLushMistNature.OfOfOverRocksSereneSoundSurroundingThatTheTheTheTheTheThemselvesToToTranquilViewersWaterWaterfallWhich**

And why no spaces but all CamelCase?

And i would guess it is your input, NOT what DallE get? so GPT has altered and corrected this text before it is sent to DallE?
In case, the discussion how GPT changes inputs before it sends them to DallE is a other topic. I only analyze here what DallE does with the sent prompt it gets, not how GPT chaining them because, this would be a GPT, and not a DallE topic.

I used to say that you have the same method as me, but you just interpret it differently. But I must be wrong. Or it’s not. I don’t accept it.

What is your basis for deciding whether a message is real or not? Just because you can’t send a message, choose the size of the image, or you can’t control the prompt that comes out, doesn’t mean that other people can’t do it.

And if you say that’s the message I sent, not the message DALLE received.


Then what would you say? Did DALLE receive and misinterpret it? Or did GPT create it, write it, and send it wrongly?

Do you know what I did while you were thinking like that?

Scene Influence and Scattering: All inputs influence mostly the entire scene. For example, you can describe a bright setting with a white horse. The setting remains bright. If you place a black horse in the same scene, suddenly all contrasts and shadows become darker. It is also challenging to describe a completely white scene and then insert a colored object into it. The color often affects the entire scene. This is not always desirable when trying to create a very specific mood or composition. It works a little the same way like a Template.

Problems that you can’t solve but I explain to you and fix them for you almost immediately, or the idea that captioner adds abstract words, I also tested your prompt to confirm the result, or fire in water, these are all things I found for the first time, I can fix them easily.
This is the difference, I took your problem to solve, to learn, to develop, you made me see the importance of abstract words, you made me notice the template, I saw that your skills had something that I didn’t have, I developed it, you are the one who even your idea, didn’t do it until it worked.

DALLE-3 and Captioner with a direct yet stubborn personality, full of doubt, confusion, and the ability to question the user. These words may sound unbelievable, but I’ve noticed these behaviors:

  1. Character creation in images: The text from the prompt can be generated as part of the image, with its length matching the prompt provided. This often occurs when writing vague prompts with disconnected sentences. While most of them may have meaning, they often contradict or are too open to multiple interpretations.
  2. Creating two low-quality images at once: This happens frequently when refreshing the image or using prompts that have minimal changes. The more complete the previous prompt was, the more likely this will occur.
  3. Incomplete images, defects, and noise: This can happen in many cases when the prompt lacks fluid meaning or uses incorrect words. Abnormalities like inconsistent colors, crooked lines, jagged edges when zooming in, improper patterns, light diffusion, unwanted lighting effects, or even ripples on water that should be still like a mirror can occur (I’ve tested and documented this on forums). Most of these issues can be resolved through the prompt, though some remain unclear due to certain controlling factors that create conflict, making it difficult for me to fully categorize them. The problem with direction that you are currently trying to solve is related to this issue.

Or the issues with direction and position in images generated by DALLE3, which you previously mentioned were due to the Captioner malfunctioning, have led me to further study and discover more intricate details. Initially, these problems could be addressed by providing sufficiently detailed and clear descriptions in the prompts to accurately determine positions as desired. However, in cases involving templates or elements that significantly influence positioning in the image, such as a high-angle view of a central square, if the prompt includes more than one element that can occupy that position—like a fountain and a clock tower—the model will place both centrally. The best it can do is to arrange them symmetrically or place them near the center of the square, but it cannot position one element in the center and the other to the side. For general users, the important thing to focus on now is knowing which template is being used for the image they want to create, which is difficult to access as it is OpenAI’s information and requires self-searching.

This knowledge is about understanding the template you told me, you lack the implementation, but I lead to a clearer breakdown of the problem and a new approach.

GPT has a different interpretation than DALLE. The more we find the clearer the boundaries of the differences, the more usable they are.
“,” also has a different meaning. For dalle, it has a weight reduction effect, which is a variable that can easily create unwanted text in an image. Removing it, adding connectors or not, affects the flow of the text. You can also use + instead of “space bar” in sentences to perform similar but different functions.

Try to observe this. and getting the implementation in the context of DALLE will help a lot.

1 Like