Dall-E is sooo bad at recognizing letters and numbers - any advice?

Hi there. I’m trying to create infographics with Dall e and it’s just not there yet.
This involves giving it some statistic, measurement, data and translating it into a visual, including the reference data. Statistics and text come out garbled and incorrect, lots of omissions and spelling mistakes. The images for the most part are really good but the text and numbers are hilarious hot garbage. Lots of work to edit the image to blank out the garble and recreate the info overlay on the image. Any advice on how to suggest better command prompts or hope with a future version? Thanks !


Dall-E3 is really only good for a word or two with any accuracy, maybe a well known phrase. You might get better quality but less visually impressive results from asking ChatGPT to use Code Interpreter to make visuals of your data, that will make use of matplotlib and other python visualisation methods, a more formal but 100% accurate data display


Ah thanks ! I’ll give it a try. Will be nice when all that is built into Dall E for us. Praying the Dall E Developer Gods are listening. Thanks so much

ME toooooo! I just go an infographic that I was wildly excited about and then looked closer and it is pure garbage lol.


Hrm. Sounds like a good pain-point to solve for people…

Not sure if the tech is there yet, even if you pieced it together with actual data…


When using DALL-E currently there is no way around learning the basic Photoshop skills to add the text oneself.
But at least the hard lifting part of creating the design is done by the model.


It’s quite difficult to explain why this isn’t how diffusion models work without giving people (and myself!) a headache, but I think understanding how these tools work is extremely important.

Here’s my mildly-informed attempt at explaining it:
DALL·E is not a multi-modal large language model. It does not reason or plan.

Its sole job is to try to denoise a bunch of pixels to create an image based on the text prompt it was given.

It is not a graphic designer, or a skilled artist drawing something from scratch.

Adding to what @Foxalabs said, an infographic (or any other form of media with lots of facts and figures) is not something you can usually get a quality version of by just “denoising”. As GPT-4 and other SOTA multi-modal models get better at providing multi-modal outputs, this should start to fill in that gap.

Hopefully that explanation wasn’t too far from the truth!


Yeah, it’s whichever AI figures this bit out! Hah… Not an easy problem…

Oh, I know! I’ve been playing since early GAN (and a lot of Disco Diffusion, but I digress…) I meant more the next-gen tools that combine what we have now with extra stuff that is a “digital graphic designer” that can take your natural language and perform a series of steps to do what you want…

So, no, not GAN alone, and probably not for a while, but can you imagine the flood of infographics the worlds would see! Infographics about infographics creating infographics automatically… wheeeee! :wink:

I know you know, I was just putting it out there for anyone else who might stumble across this post!

I completely agree, it will require an extremely powerful model that can make use of various tools (Dalle for the background, Photoshop or something similar for the facts/text, etc.).

A lot of people come to this forum upset that a tool isn’t working for what they need and blame themselves or the tool, and that’s why I think it’s so important to understand how they work!


I wish I could double-like your post, then. :wink:

Seriously, though, it’s appreciated. The more people like you that we can attract here to our community garden for devs, the more valuable and beautiful this place becomes. Thanks for doing your part!


:green_square: One workaround in those situations is to include the instruction in the prompt. For example: "… and please add the following text contained in the brackets [Sample TEXT]… ".