I’ve managed to refine a prompt that can generate an image of a glass with different types of juice in each.
Here’s the prompt:
“Create a photorealistic image of a single, clear glass with a matte finish, placed centrally against a white background. The glass should be evenly filled with {}, without any overflow. The glass should take up exactly one-fourth of the image’s width and its height should be proportional, maintaining the same dimensions across all images. The photograph must be taken from directly above, with a camera angle fixed at 45 degrees, ensuring the glass’s rim forms a perfect circle in the center of the frame. Soft, uniform lighting should accurately capture the texture and color of the {}, with no shadows around the glass.”
The {} will include the type of juice for a list of products I’ve curated.
When running this using the API, I unfortunately do not get consistent images. The glass is either sometimes random shaped, photo taken closely or too distant and the texture can be very different.
I am generating these for my ecommerce store, I have compared this to another competitor who does this exact same thing, but all of their images are symmetrical. They use DALLE-3 also so I’m not quite sure if it’s due to my prompt being poor or not. The reason I know they use it is because it’s extremely close to the images produced by DALLE-3
of course I tried some methods like maybe using a reference picture before the image is generated, but DALLE-3 does not support it
Appreciate your reply again but i would like to show you something, do you have discord by any chance? So i can give you more information on why this may be possible
So it will never create 100% accurate things like a person would do who is creating images by hand / with a camera, or whatever.
It is amazing technology, I can’t even work without after hardly one month of a paid subscription.
But you have to “learn” and “understand” what it can do and what it can’t.
Now 9 out of 10 images are (for me) really usable, because I lowered my threshold (not in terms of quality, but in term of “what do you expect from a tool like this”).
For what I have seen, the “competitor” is not using Dall-E and maybe even not AI at all.
But those are my personal findings, based on the (example) image meta-data and visual look and feel.
The most important thing for creating “consistent / symmetrical” images, is that the API of Dall-E does not support the gen_id.
It does show the parameter, but you can’t set it. So every new image is different from the image before.
When you try the same thing via the chat interface of GPT, you can set the reference_image_id to the gen_id with much better “same looking” results.
Example
For example, a basic plate with “stuff” on it.
All those three images use the exact same prompt (besides the “stuff”) and same reference_image_id.
As you can see, the plate / shadow / material / etc… is “almost” the same. But using the API, this is not possible (because of the lack of the referencing_image_id).
Solution
Render an empty plate
Render plates with stuff
Mask the “stuff” in Photoshop
Merge the “stuff” with the empty plate
This way you have the exact same plate / shadow, but with different “stuff”.
Yeah, to go along with @Foo-Bar 's excellent advice, you want to craft the prompt so that most of it stays the same except for the “focus”… ie what’s on the place in that example.
I have heard the DALLE3 team is hard at work on improving and that more control is on their wishlist too.
And it’s not a “bad” thing to do manual corrections, I think.
Afterall Dall-E is a tool, not a solution.
Sometime it creates an image in 2 seconds and then I am making adjustments to it that cost me about 2 hours (changing composition, removing parts, adding parts, make things sharper / more blurry, etc…).
But when I had to create the image from scratch, it would cost me 2 days…
So for me, this is an excellent tool, boosting my productivity and creativity to space and beyond.