VSL (Visual Scene Language) — is a language for describing visual scenes in a structured JSON format

Colleagues,

An idea came to me a while ago, and I’d really appreciate your feedback, critique, or maybe even collaboration if it proves viable.

The core concept is to develop a language/adapter (encoder-decoder) that allows LLMs to work with architectural and engineering drawings — and, more broadly, with any structured visual data. Below is a more detailed overview:


:brain: Core Idea

VSL (Visual Scene Language) — is a language for describing visual scenes in a structured JSON format that can be interpreted by large language models (LLMs).
It enables AI to read, edit, and generate scenes — not as pixel images, but as meaningful objects with properties, coordinates, and relationships.


:puzzle_piece: Technical Concept

  • Scene → represented as JSON: objects, dimensions, relations, styles, context.

  • LLM → works with this structure instead of raw pixels.

  • Renderer → visualizes the result (2D / 3D / AR).

  • The process is cyclic: text → structure → image → updated structure.


:building_construction: Possible Applications

  • BIM and AEC — describing and editing architectural and engineering models.

  • AR / VR — interacting with spatial data through natural language.

  • UI / UX — creating and modifying interfaces without visual editors.

  • Robotics — interpreting environments as structured scenes with objects and relations.


:counterclockwise_arrows_button: In Connection with BIM and AR

  • BIM provides structured data,

  • VSL makes it understandable for AI,

  • AR displays and enables direct interaction with the model.

Together they form a cycle of living design:
human ↔ AI ↔ model ↔ real space.


You can find more details and a simple working example (that can be tested right away) on GitHub:
:backhand_index_pointing_right: https://github.com/MaxZhadobin/visual-scene-language