Colleagues,
An idea came to me a while ago, and I’d really appreciate your feedback, critique, or maybe even collaboration if it proves viable.
The core concept is to develop a language/adapter (encoder-decoder) that allows LLMs to work with architectural and engineering drawings — and, more broadly, with any structured visual data. Below is a more detailed overview:
Core Idea
VSL (Visual Scene Language) — is a language for describing visual scenes in a structured JSON format that can be interpreted by large language models (LLMs).
It enables AI to read, edit, and generate scenes — not as pixel images, but as meaningful objects with properties, coordinates, and relationships.
Technical Concept
-
Scene → represented as JSON: objects, dimensions, relations, styles, context.
-
LLM → works with this structure instead of raw pixels.
-
Renderer → visualizes the result (2D / 3D / AR).
-
The process is cyclic: text → structure → image → updated structure.
Possible Applications
-
BIM and AEC — describing and editing architectural and engineering models.
-
AR / VR — interacting with spatial data through natural language.
-
UI / UX — creating and modifying interfaces without visual editors.
-
Robotics — interpreting environments as structured scenes with objects and relations.
In Connection with BIM and AR
-
BIM provides structured data,
-
VSL makes it understandable for AI,
-
AR displays and enables direct interaction with the model.
Together they form a cycle of living design:
human ↔ AI ↔ model ↔ real space.
You can find more details and a simple working example (that can be tested right away) on GitHub:
https://github.com/MaxZhadobin/visual-scene-language