Imageception - dall-e generates image inside an image and percieves only the inner one as the 'image'

How to reproduce:

  1. generate any image using dall-e 3

  2. ask GPT to ‘modify colors to fit well with a dark brown sofa’

  3. the next image will contain the initial image… and include the brown sofa.

Prompt is then:
“Modify the serene landscape featuring a lake and a mountain in the early morning, enveloped in gentle fog, to complement a dark brown sofa. Adjust the color palette to include warmer, earthier tones like rich browns, deep greens, and subtle gold or amber highlights, enhancing the mystical morning atmosphere. The mountain in the background and the foggy lake in the foreground should blend seamlessly with these new colors, creating a harmonious and tranquil scene. The painting should evoke a feeling of warmth and elegance, suitable for a sophisticated interior with a dark brown sofa.”

When asked to correct, it seems like it percives the image within the image, as rendered correctly as it does not contain the sofa [I hope this makes sense :smile: ]

This persist also when using a custom gpt with instructions asking it to focus on the image and exclude walls and furniture [eg When generating images, Wall Art’e focuses exclusively on the poster, filling the entire frame. This approach ensures that all attention is directed to the artwork itself, without any distractions from background or surrounding elements like walls or frames.]

Asking focus on the image also does not seem to help and only asking directly to remove references of a sofa seems to work.

To add, below is a bug report my GPT generated when I inquired:

Sure, I’ll create a detailed incident report suitable for an experienced developer, outlining the issue step by step.

Incident Report: Image Generation Issue in DALL-E

  1. Summary of the Issue:
  • Description: When generating images using DALL-E, inclusion of specific descriptive phrases intended to define color schemes results in the generation of images with unintended, literal interpretations of those phrases.
  • Impact: The issue limits the capacity to generate images that align with user intents, particularly in scenarios involving abstract or conceptual descriptions.
  1. Technical Details:
  • Platform: [Platform Name / Access Point of DALL-E]
  • Tool Used: DALL-E Image Generation
  • Date of Occurrence: [Date when the issue was first noticed]
  1. Steps to Reproduce:
  2. Access DALL-E image generation feature.
  3. Provide a prompt that includes descriptive phrases intended to convey color schemes or thematic elements, without the intention of these phrases being represented literally. For example, “Create a standalone 1024x1024 poster that exclusively features artistic elements, blending the color palette of a lavender field at sunset with hues suitable for a white room with a blue carpet.”
  4. Observe the generated image.
  5. Expected Behavior:
  • The expectation is for DALL-E to generate an image focusing solely on the artistic elements described, interpreting the phrase “white room with a blue carpet” as an abstract description of color scheme rather than a directive to include a room in the image.
  1. Actual Behavior:
  • DALL-E generates an image that includes a literal representation of a room, despite the prompt’s focus on creating a standalone piece of art.
  1. Logs/Screenshots:
  • Include any relevant logs, screenshots of the prompts given, and the resulting images.
  1. Potential Causes:
  • DALL-E might be interpreting descriptive phrases too literally, especially when these phrases are typically associated with physical objects or settings.
  • Lack of contextual understanding in distinguishing between literal and abstract/conceptual elements within a prompt.
  1. Suggested Next Steps for Investigation:
  • Analyze how DALL-E processes phrases that are common in physical descriptions but are intended in an abstract manner.
  • Consider algorithmic adjustments to improve contextual understanding and differentiation between literal and abstract descriptions in prompts.
  1. Additional Notes:
  • This issue highlights a broader challenge in AI image generation related to context interpretation and the balance between literal and conceptual prompt processing.

This incident report provides a comprehensive overview of the issue, including the steps to reproduce it, the expected versus actual behavior, and suggestions for investigation. It can be used to communicate the problem effectively to a development or support team.

[note: mind it’s from a different image prompt, but behaviour is exactly the same]