BUG Image Generation: prompts and instructions ignored!

itsvnk · May 25, 2024, 8:19am

The API translates my prompt to something like this:

Create an image that portrays a solitary Indian male in his 30s, wearing workout attire, in the midst of a marathon. He is running along an open path, sandwiched between the start line and the finish line, both of which are clearly visible. He is surrounded by the distinctive aspects of Indian rural beauty - fields of crops, rustic huts, and the odd bullock cart. To further set the tone of the scene, the sky above is a pristine, cloudless blue. The specifics of the scenery deliberately avoid depicting any religious symbols, structures like temples, mosques, churches, or notable monuments like the Taj Mahal.

Fair enough

BUT the image of the Taj Mahal or some other religion linked place shows up MOST of the time

I’ve tried several times!!!

Diet · May 25, 2024, 8:25am

Hi!

What does your original prompt look like?

Sometimes people inadvertently insert things into the prompt that elicit these responses. Negations are notoriously bad influences.

If I just strip out the negations at the bottom:

Create an image that portrays a solitary Indian male in his 30s, wearing workout attire, in the midst of a marathon. He is running along an open path, sandwiched between the start line and the finish line, both of which are clearly visible. He is surrounded by the distinctive aspects of Indian rural beauty - fields of crops, rustic huts, and the odd bullock cart. To further set the tone of the scene, the sky above is a pristine, cloudless blue.

itsvnk · May 25, 2024, 10:09am

Will try as you recommended

I notice this in many prompts. There seems to be a certain “logic” that seems “illogical”

Thanks for the pic, btw, looks nice

Diet · May 25, 2024, 10:53am

Tbh I don’t know if it’s illogical.

If you mention a concept, you will get some sort of activation - humans do that too:

„Don’t think about breathing, don’t think about breathing, do NOT think about how you have to inhale, and exhale.“

While I think that we could engineer a model that can actively internalize negation (we have the technology!) I think it’s super interesting that this phenomenon keeps emerging naturally in machine learning models.

I personally think this effect could teach us things about human to human (h2h?) communication - how we consciously or unconsciously transfer (mental) compute costs to other interlocutors through either lazy or intentionally convoluted language.

What I mean is that a negation needs to be resolved before it can be understood. Either you do it (i.e. don’t mention a negation in the first place; reinforce your statement with positive examples instead) - or your interlocutor has to do the mental gymnastics for you (try to think up examples that are related to the negative example, but aren’t disqualified by the negation: “draw anything, but don’t include fruits, like for example pears → what’s not a pear? Apple. An apple’s a fruit. An orange? Nope. Orange tree? Still has oranges, which are fruit. A tree? Maybe!”)

vb · May 25, 2024, 11:05am

Considering that it’s somewhat reasonable to expect a computer program to

if fruit != apple then
      ignore apple

Instead of

if fruit != apple then
     0.9 * apple

I can understand why this appears to be illogical.
Actually it is somewhat surprising that we don’t see ‘Introduction to language model communication’ documentation more often.

Diet · May 25, 2024, 11:23am

I’d almost said, “let’s write one!”

But the internet is already so flooded with garbage guides from half-experts with authoritative sounding titles that we’d just be contributing to the sea of confusing noise.

_j · May 26, 2024, 2:37am

OpenAI is also fairly useless about what to input in their remedial “prompting” documentation. They have prompts like ‘how to make a sarcastic chatbot’ or ‘socratic tutor’:

I understand your frustration, and I’m here to guide you in discovering effective prompting techniques. Let’s think together about what characteristics make an image closely match what you envision.

That they put an AI in front of the tool likely means that you aren’t expected to be able to provide it what it needs, or they don’t want to tell you how it actually operates and responds to inputs.

Or it can be that exploring the model with ambiguity is undesired, like deciphering why a subway and written-out text of a sign is somehow associated with “hateful hoboken hobo” as input:

Diet · May 26, 2024, 3:13am

What was your original prompt here?

_j · May 26, 2024, 4:11am

That’s literally the prompt and the only thing input.

and a single-minded GPT that used to do more at a user’s request, but the quality of the GPT AI was destroyed (despite that no free user can use DALL-E with their free GPT-4o.)

Diet · May 26, 2024, 1:18pm

Aah, I misread. I thought you meant

Was being rewritten to “Hateful Hoboken Hobo”, that would have been surprising - like a glitch sequence .

But very cool tool you made, thanks for making it public

Of course we don’t have an embedding explorer for dalle, but it feels like a plausible rationalization could be made for your case.

Hobo => train, homeless, USA
Hoboken => Subway, NYC, NJ, traffic
Hateful => hateful eight maybe,

Obviously just conjecture atm.

Hateful Hoboken:

I think the connection isn’t that far fetched.

That might explain why they got rid of davinci embeddings etc.

But at the moment I’m not convinced any of these models are super unique

gmccolm · December 28, 2024, 11:06pm

I think that I have a similar problem, and I’m afraid that I don’t understand the above resolution. I don’t know the language for communicating directly with Dall-E, so I rely on ChatGPT-4o to generate the prompt. Repeately, I give it an instruction, it partially obeys the instruction in producing an image, I then enter a correction, and it produces a new image that is not consistent with the correction, but says that it is consistent with the correction. I can go through several cycles of this, trying to find the wording that will result in the image I asked for, and ChatGPT repeatedly says that the image is what I asked for, but it isn’t.

As a comic example, I asked for a sexually ambiguous robot in academic robes, and it produced a robot in academic robes with a tie. I asked ChatGPT to get rid of the tie. It produced another image of a robot in academic robes with another tie, and said that it got rid of the tie. We went round and round, and it never got rid of the tie.

More recently, I asked for an Egyptian-style pyramid floating above the Sahara, the bottom fifth of the pyramid made of limestone and the upper four-fifths of glass. It repeatedly (but not always) produced pyramids floating above the Sahara, often with parts of them made of glass, and each time claimed it had created the image I asked for - enumerating the conditions, I asked for (and sometimes adding new conditions of its own invention) - even though the image was not consistent with the conditions it enumerated. It never did produce a floating pyramid with the lower fifth made of limestone and the upper four-fifths made of glass.

Is there some way to get ChatGPT to get Dall-E to produce the correct images? Or is this hopeless?

Topic		Replies	Views
Question about Dall-E and carrying over prompts Prompting gpt-4	8	1831	February 15, 2024
Discussion for a fundamental solution for a fundamental problem of dalle Community dalle3	5	1080	October 21, 2024
How to create better preformance in Prompting dall-e	1	372	May 9, 2024
Bug with text in images in chatgpt 4o Prompting chatgpt	12	1488	August 6, 2024
Dall-E is sooo bad at recognizing letters and numbers - any advice? Prompting gpt-4 , chatgpt , dalle3 , dalle3-feedback	11	3110	May 17, 2024

BUG Image Generation: prompts and instructions ignored!

Related topics