Text to 3D model = billions / year

I’ve seen DALL-E’s paper and it seems that this could be possible.

One 3D model costs $30 - $3000 to make. Gaming / animation industry spends billions a year on 3D models, mainly characters. Imagine an AI that converts text → 3D model, for example, “futuristic alien robot with high tech armor” (describing as many details as necessary) or converting DALL-E’s concept art image into a 3D model.


Yes, I can’t wait! I didn’t think much of VR until I tried the Occulus2, its amazing.

How to make 3D models from a single image
A novel method for learning neural 3D object representations in which each item in the training set is seen from only one perspective. The state-of-the-art technique develops a standardized object-centric representation that is posture invariant and factorizes into a geometry and an appearance component using machine-generated labels such as 3D object recognition and panoptic segmentation.


So first of all I wonder how effective DALL-E will be in generating production quality concept art that a skilled 3D modeler could then take and build in 3D. That alone would be huge. Concept design is a whole profession in its own right and is also expensive to commission.

Perhaps, DALL-E would just make current concept artists more productive rather than obsolete. I imagine the same would be true for 3D modelers. I’m sure in the near term, the trained human artists will still be able to bring something to the table that AI’s can’t, (choosing from the many AI generated variations)


I see the power here being more for an individual hobbyist who can’t pay coorporate prices for design, etc.

But it does seem amazing if I could simply write out the description of what I want and have that 3d printed (rather than painstakingly building it out of clay, drawing it or using a 3d computer program to do which I hate because I don’t like spending tons of time on a computer in the first place).

I’m wondering if you could do this with animation as well: Give pompted command for a short 30 second animation, maybe with a sample image and bam, it poops out the animation for you?

I predicted that deep learning and similar technologies will be able to generate books, TV shows, movies, and video games eventually here: GitHub - daveshap/NaturalLanguageCognitiveArchitecture: Open source copy of my book Natural Language Cognitive Architecture


It will for sure be possible to generate both 3D models and animations from text, the question is just when. I would guess in 5 to 15 years.