The latest engine seems unable to create images without text. I have tried repeatedly numerous times for numerous types of images to prompt the engine to omit text (which is always weird, nonsensical, and often inappropriate). The engine says it has ommitted text, but the text is in every image. I have also had this problems with other elements like flags, for example. I was trying to get an image without flags, and every single image included flags, even though the engine claimed there were no flags. There does not seem to be a way to prompt the engine to fix these problems.
What is your prompt, and can you provide an output of an image?
Yeah I noticed that when asking it to draw charts it always mislabels things shooting starter instead of shooting star, bill winkle instead of bullrun.
Hi @myklbykl
First and second were created using Model GPT-4o
No commentary, just create image, and only it:
Art of bread-making process in unique and unexpected settings, in the style of culinary various stages looking life cycle photography, without humans, mixing ingredients, kneading dough, fermenting, shaping, and baking. Include realistic elements like bowls, flour, water, yeast, and a loaf of bread. Highlight the transformation from raw ingredients to finished bread, with visual details such as the texture of the dough, the bubbling of fermentation, and the golden crust of the baked loaf. The scene should be rich with texture and visual interest, emphasizing the tactile nature of bread-making. Gastronomic creativity, --up light, ar: widescreen.
GPT-4o:
DALL-E
Third and fourth were created using Model DALL-E
Art of bread-making process in unique and unexpected settings, in the style of culinary various stages looking life cycle photography, without humans, mixing ingredients, kneading dough, fermenting, shaping, and baking. Include realistic elements like bowls, flour, water, yeast, and a loaf of bread. Highlight the transformation from raw ingredients to finished bread, with visual details such as the texture of the dough, the bubbling of fermentation, and the golden crust of the baked loaf. The scene should be rich with texture and visual interest, emphasizing the tactile nature of bread-making. Gastronomic creativity --s 750 --up light, ar: widescreen.
Art of bread-making process in unique and unexpected settings, in the style of culinary various stages photography, without humans, mixing ingredients, kneading dough, fermenting, shaping, and baking. Include realistic elements like bowls, flour, water, yeast, and a loaf of bread. Highlight the transformation from raw ingredients to finished bread, with visual details such as the texture of the dough, the bubbling of fermentation, and the golden crust of the baked loaf. The scene should be rich with texture and visual interest, emphasizing the tactile nature of bread-making. Gastronomic creativity --s 750 --up light, ar: widescreen.
@polepole, thank you for these examples. can you please provide some instruction on (or a link to instructions for) providing prompts to the AI that result in accurate results? Clearly its language processing abilities need work and fixing some obvious issues, but if there is a set of known prompts that work better than natural language, it would be nice if that information was easy to find for others that are having the same problems I am.
I use natural language.
I do not use instruction like Midjourney uses “–, --” structure, or I am adding only 1 or 2, as you can see in my examples above.
But I know that if you ask “photograph” it does not include text, or rarely.
But if you ask “drawing” or “cartoon style”, it includes text mostly.
For example below, same prompts changing only 2 or 3 words; 2 images OK but last image contains some blurry text:
Good work @polepole.
@myklbyl The best instruction is to tell a good story with your prompt and to expect the need for back-and-forth in the process. There is no mechanical prompt that would help you illustrate your work.
I started with the prompt above, and then had some of the same troubles with hands and words appearing.
"Hi, we’re going to work on creating an image together.
In this image I want to work on visual storytelling, so don’t use words in the image, instead, create an image that clearly shows (doesn’t tell) the story of bread making. Your challenge is to not show this with words or people doing it: We’re focusing on the tactile, hand’s on aspects of creating bread and letting the imagination do the rest.
Please show the different phases of bread making, from selecting, to combining ingredients, allowing them to rise, then the finished product. You goal is to show this in distinct visual phases in photorealistic, homey, quality. Use widescreen dimensions. "
Hands and words! In reviewing my prompt I see the words “hands on” and “tactile,” both of which imply hands very strongly, so I have to be more specific. After a few tries we got to here:
Super close. So I used the image selector to point out the hallucinations that needed correcting. It caught them, (and it doesn’t always).
Here’s the final prompt we arrived at:
A photorealistic, widescreen image showcasing the different phases of bread making without using words, labels, people, or hands. The image is divided into four distinct sections: selecting ingredients like flour, yeast, and water laid out on a rustic wooden table; combining and kneading the dough shown in the mixing process with utensils; the dough rising in a bowl covered with a cloth; and the finished loaf of bread fresh out of the oven, with steam rising. Each section is detailed, with rich textures that make the viewer want to touch, set in a warm, rustic kitchen with wooden surfaces and soft lighting.
Now I think that’s beautiful, except the dimensions were off, and it’s not in an “unexpected location.” So I told it I love it’s work and to try again.
Beautiful, correct dimensions, well spaced, but still not exactly someplace unexpected.
I personally find this to be a beautiful illustration. It still has some minor hallucinations, but those could be easily corrected in Photoshop.
Expect a process. The model isn’t likely to one-shot your needs. This was a particularly challenging and fun problem, but there is definitely no one out there that will have some magic prompt. This is a process. Expect back-and-forth. The more constructive your art direction, the more likely the model will respond with what you have in your head.
Also, the model definitely gets “stuck on stuff,” like those wood countertops, just like all visual artists. I’ve found the best way to deal with that is to have it work on a different image for a second, then return. You don’t want to just open a new window, because you loose all the conversation and context surrounding the iterations you didn’t like, and why you didn’t like them.
@thinktank, what do you mean “image selector”? i only get one image at a time. is there a way to have it do multiple instances at once and winnow them, or a way to choose objects within an image to change? i went back and forth a bunch of times once trying to get a photo of a person with three glasses of wine: two reds and a white. i kept ending up with a person with three hands, or four glasses, or two whites and a red. and when i got an image that was almost there, i asked for a small correction and then i got a completely different image that had other things wrong or was entirely wrong. is this all just trial and error, or are there instructions somewhere on how to best use the AI including parameters that can be used, keywords, symbols, etc.?
Click on the image, that should open it in a larger window and provide the selection tool. Though, if you’re using the free version of chatGPT, I don’t know if those tools are available.
I just told you the instructions: Use the best phrasing you can—Prompt Engineering. Input images you want it to model the final on—Training and RAG. Provide constructive feedback and expect back-and-forth—Reinforcement Learning through Human Feedback (RLHF). This is how AI gets better at the tasks you assign it.
Thank you, @thinktank. The selection tool is helpful. I am using the paid version. I did notice that your prompt asks for four sections and you got five. I copied and used your exact prompt and got eight sections, but at least we’re getting closer. I’d still like to know if there are certain meaningful keywords and/or parameters that can be given.
I am trying to collect errors and tips on how to avoid them in a post…
Regarding unwanted text, DALL-E does not understand negations, whatever is mentioned in the text will appear somewhere in the image, so only mention what is desired.
GPT does not analyze the images. So “gaslights” you and says “here is the image without XXX” and XXX is still in the image. GPT comments on the images blindly and just assumes the goal was achieved.
You can get DALL-E to insert text if you add many elements where DALL-E no longer knows how to realize them. Instead of visually creating something or simply omitting it, DALL-E describes the prompt as text in the image.
(The unwanted texts are driving me to frustrations…)