Dear OpenAI Team,
I would like to submit feedback regarding spatial ambiguity in DALL·E image generation, particularly in architectural visualization, precision scene editing, and landscape design workflows. As a constant user of the image generation tool I experienced constant frustration and an inability to achieve what I desire.
When generating complex environments, users are often required to write extremely long prompts to describe spatial relationships (orientation, elevation, adjacency, direction of stairs, recess depth, edge articulation, etc.). Despite detailed wording, spatial intent frequently collapses because the model must infer arrangement purely from text. This leads to repeated iterations and structural drift.
I propose a new optional input mode for DALL·E called:
Top-Down Mapping
This mode would allow users to upload a top-down diagram representing spatial layout with embedded height information pre added before artistic rendering begins.
Proposed Workflow:
1. User selects “Top-Down Mapping Mode.”
2. User uploads a top-down drawing (simple plan view) showing NESW alignment in top left or right corner.
3. User tags up to 12 spatial elements within the diagram (e.g., stairs, stage, wall, fencing, terrain zone). Each tag has an associated entry field for height. So each item can be tagged like fence stairs or wall and the height can be expressed giving DALL-E maximum technical knowledge of the layout proposed even before any prompting is even been entered yet.
4. For each tagged item, DALL·E presents a short clarification dialog:
Orientation (e.g., east–west, north–south, 45°)
Elevation relative to viewer
Hard constraint vs weighted interpretation
Functional type (structural, decorative, terrain, etc.)
5. User selects constraint mode:
Hard Constraint (absolute placement lock)
Weighted Constraint (AI can adapt for realism)
6. Once confirmed, DALL·E locks spatial reasoning from the diagram.
7. The final prompt then focuses only on artistic qualities:
Material
Age
Mood
Lighting
Architectural style
Texture condition
Drawing Style
This system would separate spatial logic from aesthetic language, allowing:
• Reduced prompt length
• Greater structural precision
• Fewer generation iterations
• More reliable architectural outcomes
Primary Use Cases:
• Architectural visualization
• Landscape and terrain design
• Precision structural editing
Secondary potential (expandable across many domains):
• Game environment blocking
• Urban planning
• Interior layout
• Film previsualization
• Fantasy worldbuilding
The key benefit is giving DALL·E a spatial reasoning layer beyond textual description. Words are inherently ambiguous when describing complex geometry. A top-down morphological input would allow users to communicate placement at a glance.
This would open new creative workflows and significantly reduce friction for users working with exact structural intent. As an artist I have exact mental models of what I wish to achieve which in working with this generative tool I’ve come to learn can’t be done at this stage and requires a new mode that can provide the model with an internal knowledge of the geometry and layout and this allow prompts to be reduced in size for the general information that makes up the style of the image.
Please help me accomplish my goals: I have books to write and I want to fill them with Incredible 19th century copper plate imagery right out of my imagination.
Thank you for considering this proposal.
Sincerely,
A dedicated user.