Improving Text Generation in Images

alexbarge · March 8, 2025, 8:34pm

Greetings everyone, I would like to share with you the report of how I managed to create the text I wanted in an image, using a step by step approach.

Improving Text Generation in Images: A Case Study on Prompt Engineering

Introduction

In recent years, AI-based image generation models, such as DALL·E, have made significant progress in creating realistic and artistic images. However, one of the most common challenges remains the generation of readable text within images. In this case study, i worked to improve the rendering of a red flag with the word “PANICHI”, addressing and overcoming the difficulties related to text rendering.

⸻

Project Objective

The main goal was to generate a realistic and dynamic image of a red flag with the word “PANICHI” clearly readable, without distortions or the typical errors that AI models face when rendering text.

⸻

Strategy and Approach
1. Incremental Step-by-Step Approach
• Instead of attempting to generate the full word “PANICHI” immediately, we started by generating a flag with just one letter, beginning with “P”, and progressively adding the remaining letters.
• This approach allowed us to analyze the challenges at each stage and correct them before proceeding with the entire word.
2. Prompt Optimization
• The first attempts revealed issues with readability and incorrect spacing between letters.
• We refined the prompt to include precise specifications, such as:
• Clear and legible font
• Uniform spacing between letters
• Elimination of unwanted elements (such as extra lines between letters)
3. Feedback and Iteration
• After each generation, we analyzed the results and identified anomalies, such as distorted or misaligned letters.
• We refined the prompt to correct these errors and minimized visual artifacts.

⸻

Results and Conclusions

After several iterations, we successfully generated a red flag with the word “PANICHI” clearly readable, with correct letter spacing and no significant distortions.

Key Takeaways:
• Generating text in images with AI is challenging, but using an incremental approach and a detailed prompt can significantly improve results.
• Human feedback is crucial in correcting errors and guiding the model towards greater precision.
• A step-by-step approach, rather than attempting to generate the full word immediately, is an effective strategy to improve readability.

⸻

Next Steps

Test alternative prompts to see if we can further enhance text quality.
Experiment with different fonts and writing styles to optimize legibility.
Extend the experiment to other use cases, such as logos or road signs.

⸻

Conclusion

This case study demonstrates that, despite the current limitations of text generation in images, it is possible to achieve significantly improved results through an iterative process and precise prompt engineering.

polepole · March 8, 2025, 9:56pm

Hi, welcome to the community!

polepole · March 8, 2025, 11:19pm

It is interesting, because we know that:

DALL·E is not currently designed to produce text, but to generate realistic and artistic images based on your keywords or phrases.

https://help.openai.com/en/articles/6781228-how-can-i-generate-text-in-my-image

But, sometimes even long phrases come correctly.

Topic		Replies	Views
How do I prompt Dall-E to include specific sentences in image creation and not misspell the words and sentences provided? Prompting dalle3	17	3314	April 2, 2025
Spelling mistakes in Dalle-3 generated images API gpt-4 , dall-e-3 , dalle3	14	12250	July 31, 2024
Can Dall-E3 not add text already? API dalle3	6	263	April 18, 2025
Using Dalle to create backgrounds for social publications Prompting chatgpt , dalle3	4	898	March 3, 2025
Dall-E is sooo bad at recognizing letters and numbers - any advice? Prompting gpt-4 , chatgpt , dalle3 , dalle3-feedback	11	3420	May 17, 2024

Improving Text Generation in Images

Related topics