Dall-E is sooo bad at recognizing letters and numbers - any advice?

trenton.dambrowitz · February 9, 2024, 11:43am

It’s quite difficult to explain why this isn’t how diffusion models work without giving people (and myself!) a headache, but I think understanding how these tools work is extremely important.

Here’s my mildly-informed attempt at explaining it:
DALL·E is not a multi-modal large language model. It does not reason or plan.

Its sole job is to try to denoise a bunch of pixels to create an image based on the text prompt it was given.

It is not a graphic designer, or a skilled artist drawing something from scratch.

Adding to what @Foxalabs said, an infographic (or any other form of media with lots of facts and figures) is not something you can usually get a quality version of by just “denoising”. As GPT-4 and other SOTA multi-modal models get better at providing multi-modal outputs, this should start to fill in that gap.

Hopefully that explanation wasn’t too far from the truth!

Topic		Replies	Views
Spelling mistakes in Dalle-3 generated images API gpt-4 , dall-e-3 , dalle3	15	11021	July 31, 2024
Bug with text in images in chatgpt 4o Prompting chatgpt	12	1238	August 6, 2024
BUG Image Generation: prompts and instructions ignored! Prompting dall-e	10	1012	December 28, 2024
DALL-E is illiterate with the text it adds in images Prompting chatgpt	28	8925	July 13, 2024
How do I prompt Dall-E to include specific sentences in image creation and not misspell the words and sentences provided? Prompting dalle3	14	2227	June 11, 2024

Dall-E is sooo bad at recognizing letters and numbers - any advice?

Related topics