The reference to a previous image is just similar to reusing the seed of randomness, without letting you see or share the actual seed, and without that seed being repeatable across sessions.
There is no meaning within gen_id about your previous prompt input.
How do I explain it? That is not the meaning in the picture. Let’s assume with RL that it is the weight of words that occur in the learning process. In the end, it becomes a decision to respond to those words. and becomes a behavior
For example, creating lowercase DALLE cannot create e with the word lowercase because of the connection. Or don’t know it at all, creating only E. To create an image with the shape of e, DALLE must learn it until it is memorized by using it often and repeatedly. Understanding the meaning of the word Gen_ID is also important.
This is the actual programming of the text2im method that ChatGPT sends to the dalle recipient behind the scenes.
Name
Method Description
Purpose of Argument
size
The size of the requested image. Use 1024x1024 (square) as the default, 1792x1024 if the user requests a wide image, and 1024x1792 for full-body portraits. Always include this parameter in the request.
To specify the dimensions of the generated image based on the user’s needs.
n
The number of images to generate. If the user does not specify a number, generate 1 image.
Now will only generate 1 image regardless.
prompt
The detailed image description, potentially modified to abide by the dalle policies. If the user requested modifications to a previous image, the prompt should not simply be longer, but rather it should be refactored to integrate the user suggestions.
To provide the exact details or concept that the image should represent or be based on.
referenced_image_ids
If the user references a previous image, this field should be populated with the gen_id from the dalle image metadata.
To reuse a random sampling seed that normally gives diversity to each image.
DALLE does not learn.
ChatGPT does not learn, except by seeing your prior chat in a single session.
ChatGPT is told to rewrite you inputs to make them descriptive and filtered.
ChatGPT will change the prompt language and also reuse the last image seed if you specify you just want alterations to a previous image in the session instead of a new one.
By utilizing your knowledge of how it ultimately works, you can guide DALL-E, a sub-function that ChatGPT can access, to more creations to share that are on topic.
Image 1 was generated using the description from my original words; Image 2 was from the second request (the original text was ‘flip the above image’). Initially, I was happy because it confirmed what you said was correct. However, being cautious, I checked the description and realized that ChatGPT had covertly made adjustments, explaining to DALL-E what “flip” meant.
Er…
I tried again. This time my request was “first, draw an image with white lilies on the left and orange Clivia on the right. Then tell DALL-E to draw a flipped version, without further explanation.” So, Image 3 came about, which was ChatGPT providing two images.
Here are some images from the second graphic novel I’m making. This is a free to view graphic novel, i only do this as a hobby and for my nieces.
I use Dall E 3, photoshop, indesign and other adobe apps. I was a working artist/illustrator for over 35 years, I switched careers due to burnout, now I use Dall E 3 to create with my imagination and not get caught up in the process of creating art and time consumption involved.
I am currently unable to use the API, therefore I cannot experience the functionality of referencing a gen_id. However, the issue you mentioned about GPT not explicitly using gen_id, I have encountered similar situations with other AI services as well. I’ve found that no matter how I try to modify the request, it seems to be in vain, which may reflect some limitations of the current AI technology.
Although what I found And doing it would be boring. But at least being able to create different shapes in the session is not the same as breaking the limitations of DALLE. What can be done doesn’t just happen and then disappear in the session, although now you have to do it repeatedly and shorten the prompt.
I know it’s not easy to get DALLE to use lowercase by myself. But this will benefit others, not mine alone. Even though I found the guide, I still need to practice with other letters. Flipping should make the job easier.
Great tips everyone and thanks for sharing - I’m glad to come across this thread as some things discussed I had no clue about.
I’ve attempted to learn how to prompt the past few months.
Lots of time was spent researching lighting techniques of the film and tv masters, and then translating those to a detailed prompt explaining light sources and colors and light bounces from area of the room and the color of light source. I believe this to have helped tremendously to explain the style I want in prompt without actually saying “in style of foobar artist” etc. or mentioning a trademark or any genre/art movements by name. Ok I did a few times say in 10% fusion style of Memphis Design as it made some things really “pop” a bit more.
Most prompts created with 2500+ characters detailing everything. All had a written “display of emotion”, making sure if you need a sincere shot of relationships, friendship, or just simple emotions displayed you can get more intimate feeling.
P.S. when I wrote whole thing here I thought I could attach a few pics. I can only post one image at a time for now per new user rule; hope to share some more images later on.
Interesting. An existing GPT didn’t show this ability. I didn’t leave it to chance though on a new chat, giving significant procedural instruction - on an existing image task. We get the double-image of the goofy dalle version, and feedback:
However, it is hard to tell if I merely goaded the AI into hallucination, or into repeating back its prompt. It quit the task instead of continuing the “while” loop. It sees that there are two images - but it is provided the image names for its mount point.