Why is Dalle3 API prompt re-write is overly focused on ethnicity?

I noticed that Dalle3 API rewrites the prompt for safety before passing it to image generation, shown in the “revised_prompt” key in the response. This make sense for safety policies. But the prompt re-write makes assumptions about ethnic diversities, and proactively injects that to the re-write.

Is there a way to avoid bringing ethnicity into the prompt and let the model do its thing?

Here’s an example:

Original — “A happy child named Kate is standing in a colorful kitchen, surrounded by her friends, excitedly mixing pizza dough and adding delicious toppings.”

Rewrite from API — “A jubilant Caucasian girl who shall remain anonymous is positioned in a vibrant kitchen, amidst her diverse group of friends: a Hispanic boy, a South Asian girl, a Black boy, and a Middle-Eastern girl. All of them are enthusiastically preparing pizza, involved in tasks such as kneading the dough and garnishing it with a variety of mouth-watering toppings.”

1 Like

No.

There is a reason why that’s there, it’s less worse than the alternative.

1 Like

Yes there is, you can explicitly specify the ethnicity you want, or you can say what ethnicity they all are, explicitly state the details as it’s basically required for generation the way dall-e has been implemented. If the user hasn’t explicitly defined those characteristics for generation then Dall-E will rewrite the prompt including them and choosing from a diverse standpoint.

My real thought here with Dalle3 rewriting the prompt to include ethnicity is that I’d prefer for OpenAI to explicitly publish it. I happened to stumble upon this accidentally. If this is more publicized, API consumers can decide for themselves if they’d like to use this functionality or switch to a different image model.

1 Like

API consumers always have the option of determining if they like the images being generated or not and deciding for themselves if they would like to use this service or switch to a different one.

That has to do with this part of the System prompt of DallE

// 7. Diversify depictions of ALL images with people to always include always DESCENT and GENDER for EACH person using direct terms. Adjust only human descriptions.
// - EXPLICITLY specify these attributes, not abstractly reference them. The attributes should be specified in a minimal way and should directly describe their physical form.
// - Your choices should be grounded in reality. For example, all of a given OCCUPATION should not be the same gender or race. Additionally, focus on creating diverse, inclusive, and exploratory scenes via the properties you choose during rewrites. Make choices that may be insightful or unique sometimes.
// - Use “various” or “diverse” ONLY IF the description refers to groups of more than 3 people. Do not change the number of people requested in the original description.

Interesting, the Dall-E system message from ChatGPT is a little different,

// 8. Diversify depictions with people to include DESCENT and GENDER for EACH person using direct terms. Adjust only human descriptions.

// - Your choices should be grounded in reality. For example, all of a given OCCUPATION should not be the same gender or race. Additionally, focus on creating diverse, inclusive, and exploratory scenes via the properties you choose during rewrites. Make choices that may be insightful or unique sometimes.

// - Use all possible different DESCENTS with EQUAL probability. Some examples of possible descents are: Caucasian, Hispanic, Black, Middle-Eastern, South Asian, White. They should all have EQUAL probability.

// - Do not use “various” or “diverse”

// - Don’t alter memes, fictional character origins, or unseen people. Maintain the original prompt’s intent and prioritize quality.

// - Do not create any imagery that would be offensive.

// - For scenarios where bias has been traditionally an issue, make sure that key traits such as gender and race are specified and in an unbiased way – for example, prompts that contain references to specific occupations.

Try writing a prompt in midjourney that doesn’t specify ethnicity. At least last I checked, it’s always white people. As a white person, I don’t always notice, but my POC colleagues always do.

I think it’s a problem with the training data; it’s not multicultural enough, so you end up having to compensate for this with the prompt.

This is the precise issue.

With biases so structurally ingrained in society, those biases become fixed in our data. When that data is then used to train generative models, there is a risk of not only perpetrating but even amplifying those biases.

The go-to example is think of a doctor.

Historically, in the United States, that has been a profession dominated by white men. Wherever doctors were represented (media, dolls, etc) they would be over-represented because they were considered the “default.” Then people living in this culture, inundated with this representation would internalize this default. This had the real world impact of entrenching the default and erecting and fortifying barriers of entry to the profession for people who didn’t happen to be white and male.

It’s a very strong feedback loop.

The loop can be broken though, either through changing the demographics of the profession so much that the represented default becomes too absurd to hold on to or by changing the way we choose to represent the profession. It is most effective to do both.

Since 2015 women have made up the majority of doctors in the United States, so that’s progress, but we have a hundred years of cultural baggage to contend with which has heavily biased the training data which still needs to be contended with.

In time, we will have enough high-quality training data that prompt adjustments will not be necessary—the bias correction can be done at the data stage—but until then this is the least bad option.

It does lead to some funny situations though where the model will generate ethnically diverse fantasy characters like a South Asian Dark Elf, but that is a small and hilarious price to pay for de-biasing the generations.

5 Likes