Natural language to image

Hi, I am looking to convert natural language into images. Please share me the reference in achieving this.


Hello @divya.kotapati,

It looks like OpenAI has a model called DALL-E that does text-to-image synthesis, and even though the full model is not publically available, there is a smaller version called DALL-E mini that is available to the public.

Also, while reading about DALL-E, I found that there is another model called VQ-VAE-2 that generates images from text input also!

I haven’t personally used these models yet, but there does appear to be information that guides you on setting up the models for yourself! Let me know if you have any questions!

DALL-E mini: | GitHub Repo

VQ-VAE-2: | GitHub Repo


Hi, maybe you should try using this VQGan and CLIP on this Google Colab.

The colab is in spanish because I’m using a video from DotCSV for reference. I really recommend you seeing it.

Oh that’s cool, I did not know about DALL-E mini. Here is “a darkly erotic scene in a turkish lounge”


Here is a super happy bright and shiny fish in a mountain stream


And finally a terrifying murder clown

Father… why did you create me… :rofl:



Any other accurate way to generate images from natural language rather than DALL-E mini?

You can also start from scratch and learn how to start generating images from your own model! Or you can check out more resources!

