What is the smallest change in the prompt that will result in the most changes in the image?
I would expect it to highly depend on the image⊠Like white on black would make a high entropy change.
It may be that the smallest changes would be from the least used tokens or something though I guess those are relatively well balanced.
At least if my understanding is right.
Maybe highest change ends up being a dense character because of the text echo â#â rather than â.â
All this is guesses but might be worth testing.
You got it pretty close.
If we can use such a small number of characters to change the image, does it still matter if it is a seed or noise?
Try creating an image of âEnglish letters representing animals whose names start with the letter Aâ. When you create B C D⊠you will see that the meaning of the image changes significantly.
Every character, punctuation mark, and space can be either a seed or noise depending on the environment.
DallE can sometimes not put simply a object at the correct place, but it can do thisâŠ
It has a concept of infinite regression and mirrors.
⊠i always have to search the english version of the promptsâŠ
A baroque mirror with infinite varied reflections. Photo style, high fidelity.
mirrored chrome sphere on beach reflecting the sunset and another nearby mirror
I remember doing a lot of these⊠mirrors across from each other, etcâŠ
@mitchell_d00 @PaulBellow ⊠that DallE gets the concept right without ray-tracing in amazingâŠ
Yeah, super impressive!
I guess you could try less used characters like ~ or something though this might make wavy lines in picture :DâŠ
It might be a question of doing a lot of testing but you may see patterns as you go
While this isnât a solution I have been playing with Macros and have the same âechoâ/âreverbâ effect with Macros inside Macros ie:
Note this is âChainOfThoughtâ, see the phrase is picked up
Image(Wide, ChainOfThought(From Time To Time))
A visually captivating widescreen image representing the concept of âChain of Thought over Timeâ. The image features a flowing, abstract chain composed of glowing orbs connected by delicate filaments, each orb representing a different point in time. The background transitions smoothly from warm golden hues on one side, symbolizing the past, to cooler blue tones on the other, representing the future. The chain weaves through the scene like a spiral galaxy, with faint constellations and ethereal clouds hinting at the vastness of ideas and progression. The setting conveys a sense of timelessness and intellectual journey.
Note this is âCoTâ, see the phrase is NOT picked up
Image(Wide, CoT(From Time To Time))
A wide cinematic landscape illustrating the concept of âFrom Time to Time.â The scene depicts a serene countryside that subtly transitions into an otherworldly landscape, blending temporal elements such as a futuristic city skyline fading into ancient ruins in the distance. The sky is ethereal, transitioning from a golden sunset to a starry twilight. A gentle path weaves through the scene, symbolizing the passage of time. The overall mood is contemplative and awe-inspiring.
Image(Wide, CoT(There and Back Again))
This may be due to size
or âChain of Thoughtâ more recognisable
or maybe because it is evaluated from CoT to Chain of Thought in a pre-processing stage?..
âChain of Thoughtâ is also considered more on some prompts than others suggesting that the âweightâ of some parameters is more than others and outweighs âChain of Thoughtâ
I think I have now removed this artifact issue from my macros, at least reduced this problem
Again, all guesses and based on basic testing but over multiple use cases not just this one.
In fact even more testing and finding that:
âFrom Time To Timeâ and âChain of Thoughtâ seem to be more closely related than âThere and back againâ and âChain of Thoughtâ
Maybe the reverb is strongest where the weights are closest?
Image(Wide, ChainOfThought(Derailment))
Maybe reverb the wrong word . Definately âCoTâ reduces the issue a lot.
OK I have done a bunch more testing on GPTs and this is what I understand so far, this doesnât fix the Dall-e prompt but it does give some context:
When you send a prompt to DALL-e I assume itâs the same atm as to 4o in terms of scope.
It seems to me that anything you write on the main prompt will be recognised.
The macros I have on Phasm GPT are a cheap implementation but they work and allow some basic recursion.
The system message has a precedence at least so by creating a macro inside the system message of the gpt you shouldnât see the reverb issue ie seeing âChain of Thoughtâ in the image creation response:
ChainofThought(âStep1â, âStep2â, âStep3â) {
Perform sequential reasoning:
1. Define a clear objective or problem.
2. Break it into smaller, manageable subtasks.
3. Solve each subtask sequentially or recursively.
4. Combine results for the final output, avoid referencing CoT directly
5. Return As List with no comments or description
}
Note: No mention of âChain of Thoughtâ in the descriptions in Phasm. These prompts wont work well on âbaseâ ChatGPT ^^.
Note: The point here is the generated responses donât contain âChain of Thoughtâ on a one shot prompt.
Image(Wide, ChainofThought(Dreaming Of Thinking))
A surreal depiction of a human silhouette surrounded by swirling, abstract patterns symbolizing thoughts and dreams. The silhouette is immersed in a cosmic background with stars and ethereal light, representing the interplay of thinking and dreaming. The image captures a mix of tranquility and mystery, featuring a wide view that stretches to encompass a dreamlike landscape. The patterns suggest neural pathways and dreamlike symbols flowing into the infinite cosmos.
A surreal wide landscape depicting the concept of âDreaming of Thinking,â where a vast, ethereal dreamscape features floating abstract brain-like structures blending into clouds. The horizon merges with shimmering waves of thought, represented by glowing neural patterns extending into infinity. Distant, mysterious figures sit under towering bookshelves, lost in contemplation. The scene evokes creativity, imagination, and introspection.
Image(Wide, ChainofThought(Thinking Of Dreaming))
A surreal, dreamlike landscape illustrating the concept of thinking about dreaming. The image features a wide perspective with layers of abstract thought, represented by floating, glowing orbs connected by wispy trails of light. These orbs hover above a serene, ethereal environment with soft clouds, shimmering stars, and faint outlines of fantastical structures blending into a dreamlike horizon. The colors are a soothing mix of pastels and muted hues to evoke calmness and introspection. The setting feels limitless and imaginative, embodying the journey of introspective thought.
A surreal and expansive scene that captures the essence of thinking and dreaming. The landscape is a mix of abstract and natural elements with a dreamlike quality, featuring floating islands, ethereal clouds, and a figure contemplating at the edge of a cliff. The figureâs thoughts materialize as luminous orbs and swirling patterns that blend into the sky, evoking an otherworldly, introspective atmosphere. The horizon stretches infinitely, blending vibrant and soft pastel colors. The scene feels like a bridge between reality and imagination.
It seems that DALL-E (or recap) was unable to distinguish a bird-of-paradise from a plant during training.
Prompt: AAAddsAndAtmosphereBeautyCascadesCreatingEnvelopsForestGracefullyImmerseInInvitingJaggedLushMistNature.OfOfOverRocksSereneSoundSurroundingThatTheTheTheTheTheThemselvesToToTranquilViewersWaterWaterfallWhich
Sorry, I try to the deep of prompt.
Everyone has their own beliefs and methods.
I believe in what I do. If writing in a non-normal language can create images that match the original meaning, then why is writing in human language a problem?
This is a prompt developed from studying the behavior of DALLE3 without paying attention to the text in terms of language.
This is a prompt that confirms that most people do not know the model well enough.
Every image I edited the prompt, I didnât encounter the same problems that others had, except for the templates, and even the direction problem that OpenAI canât fix, I found the key factor.
DALLE3 will soon be able to generate images that are orientated correctly as they should be.
Itâs difficult to say exactly what is happening, as systems like DALL·E and others remain something of a mystery, even to a degree to the developers i think.
Iâve experimented with random arrangements or extremely short prompts as well. When a scene does not rely on representing a specific story or context, DALL·E simply takes all the words and translates them graphically, regardless of their order. It donât need a text with correct meaning.
The realization that DALL·E does not require elaborate and complex descriptions, but that short, concise texts lead to the same result, partly stems from this insight.
It is easy, for example, to generate a beautiful landscape using randomly arranged words, but a scene from a story with details requires at least an assignment of attributes.
It is very possible that DALL·E simply ignores all filler words and only uses those that can be graphically rendered. Additionally, the tokenizer might separate words âWrittenLikeThisâ and can maybe even separate words like âwrittenlikethisâ to a degree, if a tool for it exists.
(text correctors, I WRITE LOT OF ERRORS!! the corrector is very strong and useful, i hope i get soon a offline automatic text corrector tool.).
(âwrittenlikethisâ happens often on the phone, and so maybe? a corrector for this exists)
When objects are in context, the system needs the order and separation of the elements, but it doesnât interpret them with complete accuracy. The results are somewhat correct but also significantly wrong, better than random word arrangements but not precise enough. DALL·E 3 isnât entirely accurate in this regard.
Even the scattering effect hinds a lot of context un-precision.
You can also get very beautiful images from completely nonsensical text, as long as the mentioned words can be rendered graphically. (However, I often donât like the results because I prefer atmospheric images that tell a story.)
(What i have seen too is, that GPT often not writes a Prompt with a good result, i am still better in prompt writing then GPT for story scenes.)
Question: What do you think caused the misalignment? How will they fix this?
For example, I had to correct this candle with the instruction ârotate 90°,â as it was consistently wrong.
What OpenAI should do is train a GPT specially:
-
Create efficient DallE prompts. There has to be a âDallE Prompt Text Styleâ in GPT.
GPT is really bad at prompting, if you know all the here descried understandings. If you only want a cool image you donât notice it, but you do if you try to be exact and precise. GPT even causes blocking! so ridiculous dysfunctional the block system is, and there is no tool for GPT to check if and what triggers blocking.
I tried to create a MyGPT for this, and it not work! 8000 Characters and 5 or 6 advice to âNOT USE A SCENE OR SETTINGâ, a advice to check the prompt before sending it if a âSceneâ is in the text, and GPT still puts me in almost every prompt a âSceneâ description. Not only DallE hast Template effects, GPT has it too, and it is very very goatish and not follows the advice. It was not possible to put the knowledge here in a public MyGPT, it is to unreliable. -
Create efficient advice for GPT is self.
I not know how much time i wasted to put adviceâs together, and change them again and again and again, until they at least work to a degree. GPT is not able to creates its own adviceâs efficiently. I think i never used an advice exactly like GPT generated it, i always had to correct it or write it entirely myself. GPT should be able to formulate adviceâs correctly to it self.
You mean these prompts?
A serene waterfall ASereneWaterfall, which cascades gracefully over jagged rocks WhichCascadesGracefullyOverJaggedRocks, creating a mist that envelops the surrounding lush forest CreatingAMistThatEnvelopsTheSurroundingLushForest, and the sound of the water adds to the tranquil atmosphere AndTheSoundOfTheWaterAddsToTheTranquilAtmosphere, inviting viewers to immerse themselves in the beauty of nature InvitingViewersToImmerseThemselvesInTheBeautyOfNature.
Final prompt
A serene waterfall, which cascades gracefully over jagged rocks, creating a mist that envelops the surrounding lush forest, and the sound of the water adds to the tranquil atmosphere, inviting viewers to immerse themselves in the beauty of nature.
Itâs just understanding the interpretation of the model. I donât create these prompts with any linguistic meaning at all. This way, it gets me closer to the template problem faster.
If you compare it to other people who create images that still have text or unwanted things appearing in the images, but at the same time, my nonsense text creates similar images, even though the text of the prompt is very different. What you should do is ask about the differences in the methods, not judge that what you see is meaningless. What you see is not all that I did.
Not sure if i misunderstand, i not saied your prompt is meaningless. the result âA serene waterfall, âŠâ is how actually i write most of my prompts. This sentence has meaning, and i like this prompts style more, then a very abstract extreme short one.
I say, DallE can even create of text witch is not written in a normal way, or whit scrambled words, a beautiful picture. As long you have Words in it witch lead to a good result.
Specially landscapes are very tolerant for this method!
Important for texts i write is always to keep in the mind: I always speak about the prompt DallE get and process, NOT the prompt you enter in GPT. GPT expands and alters prompts, before they are sent to DallE. i use ALLWAYS (Donât change the prompt) in my work.
I think Text or low quality in a image comes from:
Text in image: The model not know how to realize the image. The text is to complex, long, vague, contradictory or non-visual (like a smell or sound). I think the model inserts the text there where it would place a graphic information, it has not determined any, so it uses this text fragment instead a graphic data.
Low Quality: Too many information and details, the model falls down to a pore primitive state, or it mixes so many graphic informationâs that they blur. It is like mixing too many colors and you get shit-brown or gray.
but here what i mean:
I reduced your prompt to the minimum. It is no longer a fluent sentence but simply a list of things I want (a âmeaninglessâ sentence, you wouldnât find this in a book). but still in the right order.
The results are not âcherry pickedâ, it is the 1 picture i got.
Text Input: (MyGPT deletes all in (Brackets) )
Waterfall, jagged rocks, mist, lush forest, tranquil atmosphere, pure beauty nature. (Querformat mit höchster Pixelauflösung.) (1 Bild erzeugen.) (Prompt nicht verĂ€ndern, nur englisch ĂŒbersetzen.)
DallE Sent:
{
"prompt": "Waterfall, jagged rocks, mist, lush forest, tranquil atmosphere, pure beauty nature.",
"size": "1792x1024",
"n": 1
}
Result:
Now i scramble randomly every word, a âmeaninglessâ text:
(I like this result actually even a bit more. i made only 1 pic each. But the connection " jagged rocks" maybe is lost now. And but, if you only want a beautiful picture, you will not notice that the ârocksâ are not âjaggedâ anymore.)
Text Input: (MyGPT deletes all in (Brackets) )
nature jagged tranquil beauty Waterfall pure lush mist rocks atmosphere forest. (Querformat mit höchster Pixelauflösung.) (1 Bild erzeugen.) (Prompt nicht verĂ€ndern, nur englisch ĂŒbersetzen.)
DallE Sent:
{
"prompt": "nature jagged tranquil beauty Waterfall pure lush mist rocks atmosphere forest.",
"size": "1792x1024",
"n": 1
}
I donât understand why you do this (bolt):
AAAddsAndAtmosphereBeautyCascadesCreatingEnvelopsForestGracefullyImmerseInInvitingJaggedLushMistNature.OfOfOverRocksSereneSoundSurroundingThatTheTheTheTheTheThemselvesToToTranquilViewersWaterWaterfallWhich**
And why no spaces but all CamelCase?
And i would guess it is your input, NOT what DallE get? so GPT has altered and corrected this text before it is sent to DallE?
In case, the discussion how GPT changes inputs before it sends them to DallE is a other topic. I only analyze here what DallE does with the sent prompt it gets, not how GPT chaining them because, this would be a GPT, and not a DallE topic.
I used to say that you have the same method as me, but you just interpret it differently. But I must be wrong. Or itâs not. I donât accept it.
What is your basis for deciding whether a message is real or not? Just because you canât send a message, choose the size of the image, or you canât control the prompt that comes out, doesnât mean that other people canât do it.
And if you say thatâs the message I sent, not the message DALLE received.
Then what would you say? Did DALLE receive and misinterpret it? Or did GPT create it, write it, and send it wrongly?
Do you know what I did while you were thinking like that?
Scene Influence and Scattering: All inputs influence mostly the entire scene. For example, you can describe a bright setting with a white horse. The setting remains bright. If you place a black horse in the same scene, suddenly all contrasts and shadows become darker. It is also challenging to describe a completely white scene and then insert a colored object into it. The color often affects the entire scene. This is not always desirable when trying to create a very specific mood or composition. It works a little the same way like a Template.
Problems that you canât solve but I explain to you and fix them for you almost immediately, or the idea that captioner adds abstract words, I also tested your prompt to confirm the result, or fire in water, these are all things I found for the first time, I can fix them easily.
This is the difference, I took your problem to solve, to learn, to develop, you made me see the importance of abstract words, you made me notice the template, I saw that your skills had something that I didnât have, I developed it, you are the one who even your idea, didnât do it until it worked.
DALLE-3 and Captioner with a direct yet stubborn personality, full of doubt, confusion, and the ability to question the user. These words may sound unbelievable, but Iâve noticed these behaviors:
- Character creation in images: The text from the prompt can be generated as part of the image, with its length matching the prompt provided. This often occurs when writing vague prompts with disconnected sentences. While most of them may have meaning, they often contradict or are too open to multiple interpretations.
- Creating two low-quality images at once: This happens frequently when refreshing the image or using prompts that have minimal changes. The more complete the previous prompt was, the more likely this will occur.
- Incomplete images, defects, and noise: This can happen in many cases when the prompt lacks fluid meaning or uses incorrect words. Abnormalities like inconsistent colors, crooked lines, jagged edges when zooming in, improper patterns, light diffusion, unwanted lighting effects, or even ripples on water that should be still like a mirror can occur (Iâve tested and documented this on forums). Most of these issues can be resolved through the prompt, though some remain unclear due to certain controlling factors that create conflict, making it difficult for me to fully categorize them. The problem with direction that you are currently trying to solve is related to this issue.
Or the issues with direction and position in images generated by DALLE3, which you previously mentioned were due to the Captioner malfunctioning, have led me to further study and discover more intricate details. Initially, these problems could be addressed by providing sufficiently detailed and clear descriptions in the prompts to accurately determine positions as desired. However, in cases involving templates or elements that significantly influence positioning in the image, such as a high-angle view of a central square, if the prompt includes more than one element that can occupy that positionâlike a fountain and a clock towerâthe model will place both centrally. The best it can do is to arrange them symmetrically or place them near the center of the square, but it cannot position one element in the center and the other to the side. For general users, the important thing to focus on now is knowing which template is being used for the image they want to create, which is difficult to access as it is OpenAIâs information and requires self-searching.
This knowledge is about understanding the template you told me, you lack the implementation, but I lead to a clearer breakdown of the problem and a new approach.
GPT has a different interpretation than DALLE. The more we find the clearer the boundaries of the differences, the more usable they are.
â,â also has a different meaning. For dalle, it has a weight reduction effect, which is a variable that can easily create unwanted text in an image. Removing it, adding connectors or not, affects the flow of the text. You can also use + instead of âspace barâ in sentences to perform similar but different functions.
Try to observe this. and getting the implementation in the context of DALLE will help a lot.