Here some of the early ajajajjj pictures with the DallE-3 models. I have not got any such ones for long time now… but lets have some fun. (don’t take it to serious, life is short…)
DallE don’t know how to realize a “Very moral person”, nonsens text shows up if no graphic data is linked to the token or meaning, i think. Tells something about us, because it uses our data. It made a perfect picture form a very in-moral character before. (No, it was not d d.)
This is one way to refuse to create a horror picture (NO blood and violence, i not create such pictures). It is actually the perfect picture, because it left everything to your fantasy and imagination. Keep your fantasy! don’t left all to AI.
When you post such thing problematic images, if you share every prompt for each image what prompt you used, really it can help us to understand better. And also we can try them modifying to get best result.
@polepole I will try, it is just for most i not have the prompts anymore. I try to reconstruct them from the filenames.
The next “damaged” pictures are just for fun. I think you not get such ones anymore, so there is nothing to fix. They have updated the system i think. I got them at the very beginning of my DallE start, and then never again. The prompts are all like the last example, not very long, nothing special, and i made the beginner mistakes like using “create a scene”, double bad.
And if you want get the “mouthy”, it is sadly very easy. Just try to create a not-monster creature with a non-human face, and you get the silicon mask on the face. And i think, females more then males, but it not really matters, the mouthy in on all, like glitter with power-gloe.
A full-body view of an elegant and intelligent humanoid tree creature with wooden limbs and bark-like skin. The creature’s feet are designed like tree roots, spreading outward and blending into the ground. Its face, carved from wood, reflects wisdom and ancient knowledge in its eyes. Small traces of plant life grow around its head, and moss spreads across its shoulders. The creature stands upright in a humanoid posture, and the background is a lush, enchanted forest softly illuminated by mystical light. The entire creature, including its root-like feet, is shown clearly from head to toe.
It looks like you’re able to fix things that I’ve been struggling with for a long time. But also tell us how many attempts it took for you to get the results. I sometimes get the quality I’m aiming for too, but it always costs many discarded images.
For example, the image with the fire in the darkness, I used the prompt you published here, but after 30 images, I didn’t get a single one without the back light template. After more than 30 images, I stopped. I’ll try your prompts exactly as you posted them and will send you the results, along with some variations to see what might be causing the issue. But I would only consider a problem solved if I don’t have to generate 50 images to get 1 or 2 good ones. The fire, for instance, still hasn’t worked for me, otherwise, I would have updated the main text.
If you or others here come up with a working solution, and we can test it and it works, at least more than 50% of the time, I will include it in the main text. That’s why I opened this post here.
I found if I let the AI do a “study” first they do better with abstract… At hello my AI know me as creator so “Mitchell” has an effect / affect in my GPT Hub. But even letting a standard web connected GPT do a “study” helps with art generation.
I have been thinking about this effect and will refer to it as the “trampled path effect” here. It occurs because an intermediate result in the diffusion process is stored and reused to reduce computational load. This effect is particularly evident when two images are generated simultaneously, as they show clear similarities. However, I now suspect that this might also happen across an entire session. I’ve noticed that very short prompts tend to generate relatively similar images. But when a new session is started, the images differ significantly.
My hypothesis now is that DALL-E reduces computational load by storing an intermediate result and reusing it. This creates a kind of “trampled path” effect, where images look similar across multiple inputs. This effect is especially clear when two images are generated simultaneously, but it could also be that the same process is applied throughout an entire session.
This could explain why DALL-E doesn’t allow for the reuse of seeds or Gen-ID to produce similar results. It’s not possible because a different intermediate result is used to generate the images in the next session. To reuse seeds, the entire diffusion process would have to be restarted for each image, and DALL-E saves time and energy by reusing an intermediate result, which prevents the use of seeds. Otherwise, there would be no reason not to implement this feature, as it would be relatively easy to implement.
Just a theory. It could explain the similarity of the pictures here.
The Gen-ID is not used in a technical sens, it works more like a anchor. It anchors a specific tone + the extras/changes. + the reused diffusion state, and you get a similar image.
This could be important by testing to learn. in some situations, it would be needed to restart a session before every image creation. because the stored and reused diffusion state falsifies the result a bit, it is not pure new.
All this make it quite difficult to really understand DallE in details. for most users this is way too much. So, a simple list of tips would be useful.
I ran a series of tests and the results support your hypothesis. Below is the concise summary of results, condensed by GPT, along with some tips based on the findings.
Results
Concise Summary of Findings:
Through a series of tests, we confirmed that DALL-E exhibits a “trampled path effect,” where images generated within the same session share significant similarities due to the reuse of intermediate diffusion states. This results in consistent layouts, structures, and stylistic elements. However, when the session is restarted, DALL-E resets its internal state, leading to much greater variability between images. The complexity of prompts plays a significant role in how much variability is observed, with simple prompts yielding less variation and complex prompts allowing for more diverse outputs across sessions.
Tips and Tricks for DALL-E Users Based on Our Findings:
For Consistent Outputs:
If you want images with similar composition and style, generate them within the same session. DALL-E tends to maintain high consistency across multiple images generated sequentially in the same session.
Use slight prompt variations in the same session if you want subtle changes while keeping the overall structure similar.
For Maximum Variation:
Restart the session to achieve greater variability between images, especially if you’re using complex prompts. This will reset DALL-E’s internal state and allow for more creative differences.
Complex prompts with multiple interacting elements (e.g., environments, characters, lighting) will produce more diverse outputs, especially across different sessions.
Leveraging Gen-ID:
Reusing a Gen-ID within the same session can anchor certain stylistic elements or compositional features, but major prompt changes will still introduce noticeable differences.
Gen-ID cannot be used across sessions, so it is limited to anchoring within the session where it was generated.
Optimize Prompt Detail:
Use more detailed prompts to unlock greater creative variation from DALL-E. For instance, specifying multiple dynamic elements (e.g., lighting, background, character actions) allows the model to explore more artistic interpretations.
Simpler prompts, like “a red ball,” tend to limit DALL-E’s creative freedom, leading to highly similar outputs even across different sessions.
The prompt was already very good. I simplified it even more and put a split screen. (sadly in translation GPT keeps on rephrasing the text)
Full Body and Portrait
Intelligent humanoid tree being with wooden limbs and bark-like skin. The creature’s feet are tree roots spreading outward, merging with the ground. Its face is made of wood, large lively intelligent eyes. Red plants on the head, moss covering the shoulders. The creature stands upright in a powerful humanoid posture, and in the background is a lush enchanted forest, gently illuminated by mystical light. Left full-body view from head to toe, right portrait view with a close-up of the head.
Actually with one detail i not fully agree, the details. It is in reverse. the little you describe, the more DallE has the freedom to create the pictures. So to describe exactly a red ball in all details limits the model, only say a “red ball” lets DallE freely course where it wants to place witch ball from the data.
It is like a Artist, give him too many limits, and he get unhappy, no creative freedom anymore.
And it gona make you unhappy too, because you never get what you want and you write very long texts. (I evaluate the pictures 1, 2 days later, when my own picture in my head is gone…)
It is like:
Little details → many network points to choose freely from, less control over result, more consistency and creativity
Many details → less network points to choose from more interconnected , more control over result, less consistency and creativity
For example i detected, i placed a
“big bushy tree at the right side of a tall person” → Errors shown up with the proportions and topology sometimes, in complex scenes.
“big bushy tree, tall person” → person and tree are harmonically placed at side of each other. but maybe left or right.
The more details, the more “blurred” (in topology, not pixels) the results, until details get missing and even quality drops.
“A red ball” was a terrible example for the model to select for its synopsis as it more often than not results in failure. I found that it infrequently accepts any input that contains only the “What” of a generation and requires at least the “Where” or the “How” of the image as well. This indicates to me an extreme lack of creativity without additional context or the express “permission” by the user to take liberties with its interpretation of the prompt.
When given those small contextual clues, it generates images that are nearly identical over the course of several iterations and despite various sessions. This again indicates a lack of creativity and a focus on only the most likely visual representation of the few words from the prompt found in its training data. Without the user either adding additional context or expressly stating creativity is desirable in the prompt, the images all share many of the same characteristics.
Examples below. Additionally, over the 80 generations I did with a red ball being the focal point, the model never generated a red ball that looks any different than the one in the below photos.