Why is it that GPT-4 can draw beautiful pictures with an incredible sense of context, and even include signs and words in its drawing, and on the other hand can write so nicely when answering prompts, but can’t, for the life of itself spell the words correctly within its drawings ?
Read the research paper, it explains such.
As @EricGT said the research paper explains it best, but I’ve tried in a different topic to explain it as well.
But if you place the text in quotes or something, It should have to “think” per see. Just put those elements as is, into the art. COnsider the elements in quotes or brackets an art element, a collection of pixels.
Once again, that’s not really how diffusion models work. It’s a great idea though!
That might be possible as some form of post-processing in the future, but at the moment it’s not feasible with the current models.
Till date the model has no change. In a usability perspective, you can try to add the etxt after the original image creation. Will that idea work? Otherwise exclude any text to the image so that the base image can be edited via image editor software. Now the difficulty is that it creates something unusable with garbled letters.