DALLE3 and gpt-image-1 Prompt Tips and Tricks Thread

You can just go full “ChatGPT is AI in the way of an API feature that should not have been put in a chatbot”…

Hey all again! What is the best place to report constant failures what comes to image distortion?


this btw seems to be problem all around ai image generation, at least Firefly didn’t manage any better what comes to bows, bowstrings and arrows. (or is this as well recognized problem as fingercount?) :slight_smile:

If you are using a Microsoft product, you’d have to complain to Microsoft.

In ChatGPT Plus, one can press a thumbs-down, and offer more direct feedback about the AI product output.

It is powered by dall-e. but fine.

Is this okay ?

Nice.

Other phrases for photography and illustration, such as “first angle perspective”, orthogonal, etc can also help align imagery. I already got straight trucks, but that wasn’t so funny to post back when.

I’ve got nothing but ChatGPT Plus requests to burn and no guilt this time of day in setting a GPT loose on tools.

15 images in and you can guess the language from the undesired text inclusion of the few that don’t return everything seen before.

Confusing dalle by getting the normal truck out of its system first and having all architectural/design angles in the image might be the best for a picture to then take over to your photo tool for further editing.


A vastly different prompt is still predictable.

nice! thanks!

ke 17. huhtik. 2024 klo 13.12 Jay F via OpenAI Developer Forum (notifications@openai1.discoursemail.com) kirjoitti:

Is there a full guide for D3, similar to the Prompt Book created for D2?

Any luck creating images that use a person in an uploaded photo? I tried prompts such as 'maintaining the same person in the photo. replace background of the photo with a photo realistic stage background typically seen in pictures of a tech ceo delivering a speech on a stage.
Screenshot 2024-05-01 at 2.52.56 PM
'. I get an AI generated person each time.

Can ChatGPT be instructed to mask out existing parts of an image provided to it automatically and then have DALL-E 3 out-fill the transparent area with a new generation?

Only in the fever dream of an AI driven mad…

image

On the API, indeed, the edits endpoint can use DALL-E 2 to replace areas which you’ve manually painted out with new content – at DALL-E 2 quality.


A similar feature is in ChatGPT Plus, but only for previous AI images, where we take a previous image in a session:

and “No more polar bear. Infill only an ethnically Asian woman sunbathing on a lounge chair by the edge of a tropical pool setting.” after painting the replacement area with ChatGPT’s new tool.

Clearly placing an uploaded photo of a natural person in a different setting would be problematic - like Donald Trump building a well for African orphans or something.

Example of a dangerous situation depicted by DALL-E

Thanks, it makes sense its limited to images that are already known, ie have vectors, encoded into the model. This supports why its possible to adjust a picture of a well known figure but not a new picture. This makes me think that if fine-tuning was possible for DALLE something like this would be possible.

It is not about the technological requirements to do it, it is about limiting the possible inputs to the platform for alteration. AI can lower the bar of entry for creating false narratives.

Thanks for reframing. Of course technical capability is there. And of course even if technically feasible, like making that image, it does not mean it should be done!

if you ask for a scene and don’t mention number of people,
it will try to fill it with hundreds of people (or more than needed usually)…
So, ask for a specific number of people … be it two people or three people
and give details…

I had a long detailed prompt today, and specified 8 people, but got a series of images with many more, from a dozen to near 30.

Here’s a transcript of our conversation. I appreciated the answer at the end, when I asked it why it was having so much trouble giving me the specified 8 people.


" ChatGPT 4

“Imagine a group of 8 explorers, women, men, and children, of varying ages and ethnicities, on a frontier, lush oil painting style, in the manner of a mid-19th century American landscape painter of the West. They have just reached a point in a mountain passage where they see a vast and beautiful land spread out before them, during the month of May.”

ChatGPT

Here’s the oil painting-style image of the group of explorers you envisioned, standing at a mountain pass and looking out over a lush landscape.

Can you do that in a 16:9 aspect ratio ?

ChatGPT

Here’s the oil painting-style image in a 16:9 aspect ratio, as you requested.

Can you fix that as follows: 1. Just 8 people 2. People of other ethnicities and skin colors 3. Faces that are not grotesquely disfigured

ChatGPT

Here’s the updated oil painting-style image, reflecting your specifications.

  1. Faces are much better 2. Different ethnicities are present 3. There are still too many people. Can you try it with just eight people ?

ChatGPT

Here’s the revised image with exactly eight explorers, as you requested.

Actually, no, there are fourteen explorers in that picture. Can you try again ?

ChatGPT

Here’s the updated image with exactly eight explorers.

Nope, this time there are over 20 people. Is your inability to correctly put a number of people into a picture a known issue ?

ChatGPT

Here is another attempt at creating the image with exactly eight explorers.

Nope. Wrong again. This time there are 17 explorers. And we are back to an all-caucasian group. I want racial diversity, with blacks, hispanics, and asians. Care to try again ?

ChatGPT

Here’s another version of the image, now with exactly eight explorers, representing a diverse mix of ethnicities including Black, Hispanic, Asian, and Caucasian individuals.

You got the ethnicity part much better. However, this time there are 20 people. So, can you explain in simple terms why you are unable to produce an image with the exact number of people requested ?

ChatGPT

Generating images with an exact number of people can sometimes be challenging due to the complexity and subtlety involved in translating textual prompts into visual content. The AI uses patterns it learned during training to create images based on descriptions, but it doesn’t “count” in the human sense. Instead, it generates components of an image based on probabilities and associations from its training data.

When it comes to creating images with specific numbers of people, the AI might misinterpret the spatial and relational aspects of the prompt, leading to more or fewer figures than requested. Also, in complex scenes like group landscapes, the AI may struggle to maintain the exact number of figures while also trying to balance other elements of the composition, such as diversity and scenery. This can result in images that don’t perfectly adhere to the numerical aspect of the prompt."

I got it down to 9, but yeah, it’s not good at counting yet, especially for transferring text to image. It’ll improve, though.

Thx so much, Paul. Your work on these forums is hugely appreciated.

Yep, I have no doubt it’ll get better. The improvements over the past, what, 18 months?, have been amazing.

My mind goes back, and I start thinking about neural networks, and programming a bit in that realm way back when . . . and where things have come to . . . astonishing curve going near vertical . . .

Also: just by the way: I was wondering how the 1792x1024 (1.75 aspect ratio) size was chosen for wide images. It is CLOSE to the classic 1920x1080 (16:9) (1.78 aspect ratio) standard for screen-presented landscape images, but not quite there. 1820x1024 (1.78 aspect ratio) would be optimal, if the 1024 is critical for misc. binary computational efficiency reasons.

[ Haha, and, of course, we greedy image shmendriks would love to be able to prompt a wide range of aspect ratios . . . which I know will come in time . . . ]

Your prompt sounds like one an AI would write, and written to avoid the “mid-19th century American landscape painter of the West” you had mentioned.

One potential symptom originates in the technology of the “wide” format I described above, where a central generated image is then expanded. Get enough prompt following of your exact prompt sent to DALL-E, placing everybody in the middle, and tweaking the input count to the realized output, you still get some new stragglers painted on the sides…

lol

A wide oil painting depicting a group of eight explorers, four women and four men of diverse ages and ethnic backgrounds, in the style of a mid-19th century American landscape artist. The lush scene captures a vast landscape from a mountain passage in May. Each explorer is labeled with numerals from 1 to 8 above their heads, in the following order: Man (1), Woman (2), Woman (3), Man (4), Man (5), Woman (6), Man (7), Woman (8). They gaze out at a beautiful land spread before them, filled with awe.


Fascinating. Thanks SO much, _j.

Just btw: this image came to mind as I was thinking about a visual metaphor for how we AI tinkerers are ALL explorers of a novae terrae [new world].