Collection of Dall-E 3 prompting bugs, issues and tips, Simple

This is a here is a list of tip for users, after experiences with DALL-E.

This is a simplified version of this post. It is a summary of the findings and experiences there, but shorter to read.

Discussions should be mainly done there, not here.

Bugs:

  • Content Policy Issues: DALL-E’s content policy system often blocks valid prompts without feedback.

  • Nonsensical Text Insertion: When pushing DALL-E to creative limits, nonsensical texts appear, where DALL-E inserts the prompt into the image.

    • Fix: You cannot get rid of it with “don’t add any text,” instead, you get more text. You have to change the prompt. DALL-E describes the image when it doesn’t know how to realize it. The more challenging the prompt, the more likely nonsense text appears. Some styles, like “drawing” or “cartoon,” are more prone to this.
      Tip from @polepole: Adding “For unlettered viewers only” helps suppress text.
  • Lack of Metadata: DALL-E files lack metadata like prompt, seed, or date. Prompts must be saved manually. WEBP format doesn’t store metadata well.

    • Fix: @chieffy99 suggests adding metadata during PNG conversion via ChatGPT. Use this prompt:
      After getting the image from DALLE, process the image, convert it to PNG, and put META data in the image before sending it to me.

Issues and weaknesses:

  • Precision: DALL·E can create absolutely beautiful images, even with very simple prompts. But in its current version, DALL·E still has some issues with precision. This is especially a significant obstacle when it comes to storytelling.

    • Fix: Experiment, try to talk DALL·E in to what it is suppose to do, be patient, it is a young tech.
  • Negation Handling: DALL-E cannot process negations like “not, no, don’t, without.”

    • Fix: Describe positive properties to avoid unwanted elements.
  • Image Orientation Issues: DALL-E struggles with portrait/vertical mode. It may create a horizontal image, rotate it incorrectly, or fill the rest with nonsense.

    • Fix: Nothing stable so far. Using “rotate 90°” helps, but is unstable.
  • Prompt Accuracy: DALL-E implements everything, even contradictions. Phrases like “close to the camera” add a camera instead of placing the element close.

    • Fix: Avoid instructions like “create/generate/visualize.” Use “All is…” for overall effects.
      Tell GPT ‘“don’t change the prompt al all, use as it is”’ or ‘“don’t change the prompt al all, use as it is, only translate precisely”’
  • Template Usage: DALL-E uses templates (lighting, facial) that are hard to override, reducing creativity.

    • Fix: Overwrite the Stereotype by multiple time describe the detail in different ways. But non works is 100%.
    • Found Stereotypes:
      • Backlight Template: Creates backlighting even when unwanted.
        • Fix: Describe darkness multiple time.
      • Facial Template: Adds plastic silicon-like faces, always the same nose mouth template (the mouthy).
        • Fix: descripe the face multiple time in different details. Monsters are easier to fix.
      • Stereotypical Aliens: H.R.Giger or Roswell aliens.
        • Fix: Avoid using “alien” to prevent stereotypical results; use “creature from an unknown species.”
      • Space Planet/Moon: Adds planets/moons unnecessarily.
        • Fix: @polepole: Replacing “Moon” with “Pearl” improves results.
      • Nonsensical Lighting: Adds inappropriate lighting elements like candles, even when “pure nature” is requested.
        • Fix: use “pure untouched nature”
  • Character/Scene Reuse: DALL-E cannot reuse characters or scenes;

    • Fix: describe a setup. then use different sections of the image and describe the details.
  • Geometric Understanding: DALL-E doesn’t fully grasp geometries. Snakes might appear as closed rings, and fingers can still be wrong, though improved. The system isn’t perfect.

    • Fix: Nothing. It is system and model based.
  • Weak Details in Faces: Faces appear well in portraits but become distorted at a distance.

    • Fix: Nothing. It is system and model based. Show important characters in close-up or portrait.
  • Adding Text: Adding text, specially long text often fails.

    • Fix: Use external software for text overlays. And tell DallE where to leav out space for it.
  • Counting Objects: DALL-E can’t count correctly or place objects precisely.

    • Fix: Nothing much. It is system and model based. But describe locations in a geometric or topological context to other objects.
  • Scene Influence and Scattering: Color and object placement affect the entire scene, making it hard to maintain specific moods. The system merges the graphical content into each-other. There are limits how many objects or attributes can be merged esthetically.

    • Fix: Keep the prompt simple straight precise, and guide but not strangle the generator.
      Use the the scattering effect to your advantage. Example: a warm color object warms up the scene.
  • Cause and Effect: DALL-E doesn’t understand cause and effect; you must carefully describe the desired outcome. Cause and effect like situations must be in this way in the training data.

    • Fix: Describe the object placement and interaction. But it can fail and needs many generations to get one right, dependent of situation.

Technical

  • Forgotten Downloads: Images don’t stay on the server long and are easily lost if not downloaded immediately. Plus users lack automatic download options.
    • Fix: Tampermonkey can help auto-download images. (Though security concerns the scripts can not being shared here.)

ChatGPT Issues

  • Prompt Generation Issues: GPT and DallE not work well together. DallE wants straight precise short descriptions, GPT like to decorate texts with useless extras. And GPT doesn’t recognize certain DALL-E issues, like negations or conditionals, often leading to misinterpreted prompts. Manual corrections are often needed.

    • Fix: Tell GPT to “not change any prompts, use as it is”, or create a “My GPT” and tell GPT to not change any prompts ever.
  • Memories: GPT’s memory feature doesn’t seem to be considered in generating prompts or images, and its purpose remains unclear. Tell GPT in the memories to not change prompts has no effect.

    • Fix: Nothing for now. Use “My GPTs” instead.
  • False Visual Feedback: GPT cannot see the images, so negations like “no text” may still result in text. GPT may falsely claim images meet specifications, leading to frustration.

    • Fix: Nothing. It is system and model based.
  • Perceived Dishonesty: GPT can “hallucinate” responses, fabricating information without verifying accuracy. Always verify facts.

    • Fix: Nothing. It is system and model based. Take it easy when GPT “lies”, it is a pattern transformator and not really intelligent.
    • AI has no true intelligence: AI systems are advanced pattern recognition tools, not truly intelligent or conscious, and they make mistakes. We are far from true AI like in Star Trek.

Tips:

  • Clear language: DALL-E works best with clear, precise, short, and graphic-oriented language. GPT tends to embellish prompts, so instruct it not to alter your prompts. Don’t write poems and long flowery embellished texts, this must be just extracted later. The magic not happens with stylish text masterpieces, but with good trained weights used in the generator.

  • Avoid Possibility Forms: Avoid forms like “should” or “could”. Directly describe what you want in the image.

  • Literal or Miss-Understanding: DALL-E takes everything literally. Always check the truly used prompt for conflicts, especially if it’s translated or expanded. Check the prompts send to DallE, or use advice not to change the text.

  • Prompt Structure: Start with the most important thing, then details, moots, and finally technical instructions. And order attributs in a sentence so that the first is mor important. Example: “red orange and yelow flowers” red will be a bit mor dominat.

  • Prompt Expansion: GPT expands short prompts, and this not very efficient for DallE. To prevent this, use “use the prompt unchanged as entered.”

  • Photo-Technical Descriptions: Specific photography terms make a difference unless triggering specific training data. But “Add a little deep-of-field” works to. If you mention a 18mm objective in a Landscape scene, the picture will be a bit wider.

  • Creativity: To let DallE be creative, use minimal instructions on environment or mood, without much details, provide just a few guidelines with constraints for specific styles. (This not work with well with very simple objects, the system must have a large spectrum of different training data to be creative.)

  • Photorealistic: If you want real life pictures, avoid keywords like “realistic” or “photorealistic,” as they may trigger painting styles. Instead use “photo style” for photorealistic images. (In paintings is the Information “photorealistic” attached. Real photos not have this to explained, so it is info maybe absent in training, so photorealistic triggers the wrong data. It is dependent how pictures are attribute mapped…)

  • Content Complexity: DALL-E processes about 256 tokens, with 30-40 graphical tokens accurately rendered. Simple, precise instructions work better than poetic language. Detailed descriptions don’t always improve results.

  • MidJourney Options: Some options like --Chaos or --style raw are interpreted differently in DALL-E and may not work as intended. The seed option doesn’t work at all. Put DallE can make some sense out of MidJourney options, even if they are not technically supported.

Strengths of DALL-E:

  • Landscapes: DALL-E has strong training data for landscapes and can generate stunning results, even for non-existent ones.

API:

1 Like

Sorry, you’re right, I’ll move it to that post.