BUG Image Generation: prompts and instructions ignored!

The API translates my prompt to something like this:

Create an image that portrays a solitary Indian male in his 30s, wearing workout attire, in the midst of a marathon. He is running along an open path, sandwiched between the start line and the finish line, both of which are clearly visible. He is surrounded by the distinctive aspects of Indian rural beauty - fields of crops, rustic huts, and the odd bullock cart. To further set the tone of the scene, the sky above is a pristine, cloudless blue. The specifics of the scenery deliberately avoid depicting any religious symbols, structures like temples, mosques, churches, or notable monuments like the Taj Mahal.

Fair enough

BUT the image of the Taj Mahal or some other religion linked place shows up MOST of the time :rage:

I’ve tried several times!!!

Hi!

What does your original prompt look like?

Sometimes people inadvertently insert things into the prompt that elicit these responses. Negations are notoriously bad influences.

If I just strip out the negations at the bottom:

Create an image that portrays a solitary Indian male in his 30s, wearing workout attire, in the midst of a marathon. He is running along an open path, sandwiched between the start line and the finish line, both of which are clearly visible. He is surrounded by the distinctive aspects of Indian rural beauty - fields of crops, rustic huts, and the odd bullock cart. To further set the tone of the scene, the sky above is a pristine, cloudless blue.

3 Likes

Will try as you recommended

I notice this in many prompts. There seems to be a certain “logic” that seems “illogical”

Thanks for the pic, btw, looks nice :slight_smile:

2 Likes

Tbh I don’t know if it’s illogical.

If you mention a concept, you will get some sort of activation - humans do that too:

„Don’t think about breathing, don’t think about breathing, do NOT think about how you have to inhale, and exhale.“

While I think that we could engineer a model that can actively internalize negation (we have the technology!) I think it’s super interesting that this phenomenon keeps emerging naturally in machine learning models.

I personally think this effect could teach us things about human to human (h2h?) communication - how we consciously or unconsciously transfer (mental) compute costs to other interlocutors through either lazy or intentionally convoluted language.

What I mean is that a negation needs to be resolved before it can be understood. Either you do it (i.e. don’t mention a negation in the first place; reinforce your statement with positive examples instead) - or your interlocutor has to do the mental gymnastics for you (try to think up examples that are related to the negative example, but aren’t disqualified by the negation: “draw anything, but don’t include fruits, like for example pears → what’s not a pear? Apple. An apple’s a fruit. An orange? Nope. Orange tree? Still has oranges, which are fruit. A tree? Maybe!”)

2 Likes

Considering that it’s somewhat reasonable to expect a computer program to

if fruit != apple then
      ignore apple 

Instead of

if fruit != apple then
     0.9 * apple 

I can understand why this appears to be illogical.
Actually it is somewhat surprising that we don’t see ‘Introduction to language model communication’ documentation more often.

2 Likes

I’d almost said, “let’s write one!”

But the internet is already so flooded with garbage guides from half-experts with authoritative sounding titles that we’d just be contributing to the sea of confusing noise. :confused:

2 Likes

OpenAI is also fairly useless about what to input in their remedial “prompting” documentation. They have prompts like ‘how to make a sarcastic chatbot’ or ‘socratic tutor’:

I understand your frustration, and I’m here to guide you in discovering effective prompting techniques. Let’s think together about what characteristics make an image closely match what you envision.

That they put an AI in front of the tool likely means that you aren’t expected to be able to provide it what it needs, or they don’t want to tell you how it actually operates and responds to inputs.

Or it can be that exploring the model with ambiguity is undesired, like deciphering why a subway and written-out text of a sign is somehow associated with “hateful hoboken hobo” as input:

What was your original prompt here?

That’s literally the prompt and the only thing input.

image

and a single-minded GPT that used to do more at a user’s request, but the quality of the GPT AI was destroyed (despite that no free user can use DALL-E with their free GPT-4o.)

1 Like

Aah, I misread. I thought you meant

Was being rewritten to “Hateful Hoboken Hobo”, that would have been surprising - like a glitch sequence .

But very cool tool you made, thanks for making it public :slight_smile:

Of course we don’t have an embedding explorer for dalle, but it feels like a plausible rationalization could be made for your case.

Hobo => train, homeless, USA
Hoboken => Subway, NYC, NJ, traffic
Hateful => :person_shrugging: hateful eight maybe,

Obviously just conjecture atm.

Hateful Hoboken:

I think the connection isn’t that far fetched.

That might explain why they got rid of davinci embeddings etc. :thinking:

But at the moment I’m not convinced any of these models are super unique