BUG Image Generation: prompts and instructions ignored!

The API translates my prompt to something like this:

Create an image that portrays a solitary Indian male in his 30s, wearing workout attire, in the midst of a marathon. He is running along an open path, sandwiched between the start line and the finish line, both of which are clearly visible. He is surrounded by the distinctive aspects of Indian rural beauty - fields of crops, rustic huts, and the odd bullock cart. To further set the tone of the scene, the sky above is a pristine, cloudless blue. The specifics of the scenery deliberately avoid depicting any religious symbols, structures like temples, mosques, churches, or notable monuments like the Taj Mahal.

Fair enough

BUT the image of the Taj Mahal or some other religion linked place shows up MOST of the time :rage:

Iā€™ve tried several times!!!

Hi!

What does your original prompt look like?

Sometimes people inadvertently insert things into the prompt that elicit these responses. Negations are notoriously bad influences.

If I just strip out the negations at the bottom:

Create an image that portrays a solitary Indian male in his 30s, wearing workout attire, in the midst of a marathon. He is running along an open path, sandwiched between the start line and the finish line, both of which are clearly visible. He is surrounded by the distinctive aspects of Indian rural beauty - fields of crops, rustic huts, and the odd bullock cart. To further set the tone of the scene, the sky above is a pristine, cloudless blue.

3 Likes

Will try as you recommended

I notice this in many prompts. There seems to be a certain ā€œlogicā€ that seems ā€œillogicalā€

Thanks for the pic, btw, looks nice :slight_smile:

2 Likes

Tbh I donā€™t know if itā€™s illogical.

If you mention a concept, you will get some sort of activation - humans do that too:

ā€žDonā€™t think about breathing, donā€™t think about breathing, do NOT think about how you have to inhale, and exhale.ā€œ

While I think that we could engineer a model that can actively internalize negation (we have the technology!) I think itā€™s super interesting that this phenomenon keeps emerging naturally in machine learning models.

I personally think this effect could teach us things about human to human (h2h?) communication - how we consciously or unconsciously transfer (mental) compute costs to other interlocutors through either lazy or intentionally convoluted language.

What I mean is that a negation needs to be resolved before it can be understood. Either you do it (i.e. donā€™t mention a negation in the first place; reinforce your statement with positive examples instead) - or your interlocutor has to do the mental gymnastics for you (try to think up examples that are related to the negative example, but arenā€™t disqualified by the negation: ā€œdraw anything, but donā€™t include fruits, like for example pears ā†’ whatā€™s not a pear? Apple. An appleā€™s a fruit. An orange? Nope. Orange tree? Still has oranges, which are fruit. A tree? Maybe!ā€)

2 Likes

Considering that itā€™s somewhat reasonable to expect a computer program to

if fruit != apple then
      ignore apple 

Instead of

if fruit != apple then
     0.9 * apple 

I can understand why this appears to be illogical.
Actually it is somewhat surprising that we donā€™t see ā€˜Introduction to language model communicationā€™ documentation more often.

2 Likes

Iā€™d almost said, ā€œletā€™s write one!ā€

But the internet is already so flooded with garbage guides from half-experts with authoritative sounding titles that weā€™d just be contributing to the sea of confusing noise. :confused:

2 Likes

OpenAI is also fairly useless about what to input in their remedial ā€œpromptingā€ documentation. They have prompts like ā€˜how to make a sarcastic chatbotā€™ or ā€˜socratic tutorā€™:

I understand your frustration, and Iā€™m here to guide you in discovering effective prompting techniques. Letā€™s think together about what characteristics make an image closely match what you envision.

That they put an AI in front of the tool likely means that you arenā€™t expected to be able to provide it what it needs, or they donā€™t want to tell you how it actually operates and responds to inputs.

Or it can be that exploring the model with ambiguity is undesired, like deciphering why a subway and written-out text of a sign is somehow associated with ā€œhateful hoboken hoboā€ as input:

What was your original prompt here?

Thatā€™s literally the prompt and the only thing input.

image

and a single-minded GPT that used to do more at a userā€™s request, but the quality of the GPT AI was destroyed (despite that no free user can use DALL-E with their free GPT-4o.)

1 Like

Aah, I misread. I thought you meant

Was being rewritten to ā€œHateful Hoboken Hoboā€, that would have been surprising - like a glitch sequence .

But very cool tool you made, thanks for making it public :slight_smile:

Of course we donā€™t have an embedding explorer for dalle, but it feels like a plausible rationalization could be made for your case.

Hobo => train, homeless, USA
Hoboken => Subway, NYC, NJ, traffic
Hateful => :person_shrugging: hateful eight maybe,

Obviously just conjecture atm.

Hateful Hoboken:

I think the connection isnā€™t that far fetched.

That might explain why they got rid of davinci embeddings etc. :thinking:

But at the moment Iā€™m not convinced any of these models are super unique

I think that I have a similar problem, and Iā€™m afraid that I donā€™t understand the above resolution. I donā€™t know the language for communicating directly with Dall-E, so I rely on ChatGPT-4o to generate the prompt. Repeately, I give it an instruction, it partially obeys the instruction in producing an image, I then enter a correction, and it produces a new image that is not consistent with the correction, but says that it is consistent with the correction. I can go through several cycles of this, trying to find the wording that will result in the image I asked for, and ChatGPT repeatedly says that the image is what I asked for, but it isnā€™t.

As a comic example, I asked for a sexually ambiguous robot in academic robes, and it produced a robot in academic robes with a tie. I asked ChatGPT to get rid of the tie. It produced another image of a robot in academic robes with another tie, and said that it got rid of the tie. We went round and round, and it never got rid of the tie.

More recently, I asked for an Egyptian-style pyramid floating above the Sahara, the bottom fifth of the pyramid made of limestone and the upper four-fifths of glass. It repeatedly (but not always) produced pyramids floating above the Sahara, often with parts of them made of glass, and each time claimed it had created the image I asked for - enumerating the conditions, I asked for (and sometimes adding new conditions of its own invention) - even though the image was not consistent with the conditions it enumerated. It never did produce a floating pyramid with the lower fifth made of limestone and the upper four-fifths made of glass.

Is there some way to get ChatGPT to get Dall-E to produce the correct images? Or is this hopeless?

1 Like