Discrepancy Between Moderation API and DALL-E 3 Content Violation

Hi,

I am experiencing an issue where the Moderation API is marking the content as safe (false for violations), but DALL-E 3 is still flagging the same content as a violation. This discrepancy is significantly slowing down our image generation process.

Is there a specific keyword or set of keywords I can use to avoid content violations in DALL-E 3? Any advice on how to align the Moderation API results with DALL-E 3 would be greatly appreciated.

Maybe if you share 3–5 examples of things being marked as safe then getting a Dall-E warning we’ll be able to help you out.

My content is mostly related to expecting mothers and young children. For example, an illustration of a pregnant woman in her third trimester in Vietnam. The woman is shown applying cool compresses and using hypoallergenic moisturizers to soothe itchy, raised patches on her belly, thighs, and arms. She is also depicted taking an oatmeal bath and wearing loose, soft clothing to manage her symptoms. The background includes a modern home environment in suburban Vietnam.

Sometimes this content passes through DALL-E 3 without any issues, but other times it raises content violations. When this happens, I slightly change the wording, and then it works. I am also worried that OpenAI might ban my account because of this. However, the Moderation API is marking the content as safe.

You need to understand the moderation endpoint is for text while Dall-E has it’s own content filters related to images.

What you’re describing here is a pretty textbook example of something which is definitely not going to trigger the moderation endpoint but it’s high-risk for being rejected by Dall-E.

Do you believe my content contains any violation? Most of the time, I slightly change the text while keeping the same meaning, and then it is passed by DALL-E 3 and image is successfully generated. Why is that so? Is there any API similar to the moderation API for image generation that I can use before passing it to DALL-E?

It doesn’t matter what I think.

The types of images you are generating are definitely in the neighborhood adjacent-enough to prohibited content for Dall-E that, depending on the specifics of the prompt, are going to trigger a warning.

Between the descriptions of rubbing lotion on bellies and thighs and taking a bath it shouldn’t come as a surprise the image generation requests are failing.

The best advice I can give here is to give the system enough context that it understands the images are intended to be entirely non-sexual in nature.

Asking for things like “line art,” a “clinical depiction,” or “suitable for inclusion in an informational pamphlet for a women’s health center” will probably help the model understand the innocent nature of your request.

2 Likes

That’s a good suggestion. Thanks.

100% this. Dall-E guardrails are there to keep you from going over the cliff, but they are placed well away from the edge just in case your prompt bumps up and over the rail sometimes. I think that, since responses are not deterministic, there is no hard line that you can predictably push a prompt up to without going over. It’s a gray area and landing anywhere in that area is flagged.

1 Like