Dall-E is sooo bad at recognizing letters and numbers - any advice?

lrowlandson · November 16, 2023, 5:27pm

Hi there. I’m trying to create infographics with Dall e and it’s just not there yet.
This involves giving it some statistic, measurement, data and translating it into a visual, including the reference data. Statistics and text come out garbled and incorrect, lots of omissions and spelling mistakes. The images for the most part are really good but the text and numbers are hilarious hot garbage. Lots of work to edit the image to blank out the garble and recreate the info overlay on the image. Any advice on how to suggest better command prompts or hope with a future version? Thanks !

Foxalabs · November 16, 2023, 5:33pm

Hi and welcome to the Developer Forum!

Dall-E3 is really only good for a word or two with any accuracy, maybe a well known phrase. You might get better quality but less visually impressive results from asking ChatGPT to use Code Interpreter to make visuals of your data, that will make use of matplotlib and other python visualisation methods, a more formal but 100% accurate data display

lrowlandson · November 16, 2023, 6:04pm

Ah thanks ! I’ll give it a try. Will be nice when all that is built into Dall E for us. Praying the Dall E Developer Gods are listening. Thanks so much

biancaS · February 9, 2024, 11:03am

ME toooooo! I just go an infographic that I was wildly excited about and then looked closer and it is pure garbage lol.

PaulBellow · February 9, 2024, 11:24am

Hrm. Sounds like a good pain-point to solve for people…

Not sure if the tech is there yet, even if you pieced it together with actual data…

vb · February 9, 2024, 11:43am

When using DALL-E currently there is no way around learning the basic Photoshop skills to add the text oneself.
But at least the hard lifting part of creating the design is done by the model.

trenton.dambrowitz · February 9, 2024, 11:43am

It’s quite difficult to explain why this isn’t how diffusion models work without giving people (and myself!) a headache, but I think understanding how these tools work is extremely important.

Here’s my mildly-informed attempt at explaining it:
DALL·E is not a multi-modal large language model. It does not reason or plan.

Its sole job is to try to denoise a bunch of pixels to create an image based on the text prompt it was given.

It is not a graphic designer, or a skilled artist drawing something from scratch.

Adding to what @Foxalabs said, an infographic (or any other form of media with lots of facts and figures) is not something you can usually get a quality version of by just “denoising”. As GPT-4 and other SOTA multi-modal models get better at providing multi-modal outputs, this should start to fill in that gap.

Hopefully that explanation wasn’t too far from the truth!

PaulBellow · February 9, 2024, 11:46am

Yeah, it’s whichever AI figures this bit out! Hah… Not an easy problem…

Oh, I know! I’ve been playing since early GAN (and a lot of Disco Diffusion, but I digress…) I meant more the next-gen tools that combine what we have now with extra stuff that is a “digital graphic designer” that can take your natural language and perform a series of steps to do what you want…

So, no, not GAN alone, and probably not for a while, but can you imagine the flood of infographics the worlds would see! Infographics about infographics creating infographics automatically… wheeeee!

trenton.dambrowitz · February 9, 2024, 11:53am

I know you know, I was just putting it out there for anyone else who might stumble across this post!

I completely agree, it will require an extremely powerful model that can make use of various tools (Dalle for the background, Photoshop or something similar for the facts/text, etc.).

A lot of people come to this forum upset that a tool isn’t working for what they need and blame themselves or the tool, and that’s why I think it’s so important to understand how they work!

PaulBellow · February 9, 2024, 11:55am

I wish I could double-like your post, then.

Seriously, though, it’s appreciated. The more people like you that we can attract here to our community garden for devs, the more valuable and beautiful this place becomes. Thanks for doing your part!

iris3dgames · February 10, 2024, 1:03am

dimitripletschette · May 17, 2024, 12:19pm

One workaround in those situations is to include the instruction in the prompt. For example: “… and please add the following text contained in the brackets [Sample TEXT]… “. Feel free to connect on LinkedIn or another social network if needed to continue the discussion and exchange ideas.

Topic		Replies	Views
How do I prompt Dall-E to include specific sentences in image creation and not misspell the words and sentences provided? Prompting dalle3	17	2817	April 2, 2025
Spelling mistakes in Dalle-3 generated images API gpt-4 , dall-e-3 , dalle3	15	11738	July 31, 2024
Bug with text in images in chatgpt 4o Prompting chatgpt	12	1728	August 6, 2024
BUG Image Generation: prompts and instructions ignored! Prompting dall-e	10	1149	December 28, 2024
DALL-E is illiterate with the text it adds in images Prompting chatgpt	28	9476	July 13, 2024

Dall-E is sooo bad at recognizing letters and numbers - any advice?

Related topics