Worried about being banned- using the api and getting regular content policy violations for innocuous prompts

I am using the api (specifically client.images.generate) to generate images. I sometimes get a content_policy_violation BadRequestError. However, the prompt I am using was written by openai itself.

And when I run the prompt through openai’s moderation code (client.moderations.create) it is not flagged and no violations are reported by that.

I don’t mind trying to make the image again with a new prompt, however, I am generating a lot of images and I am worried that my account will be banned if too many images come back with content_policy_violation.

MY QUESTION: Need I worry about being banned, especially since openai is creating the prompts that I am using to make the images, and given that the prompts pass openai’s own moderation client.moderations.create function without any categories flagged.

Hopefully a Leader can chime in. It may be a good idea to reach out to support at the same time.

For the other members here & some insights, what are some of the prompts that you are using?

In regards to the discrepancy between the moderation endpoints: Yeah it’s kind of stupid to check the prompt, only to have it re-written and throw policy errors.

If you are asking for permission to run image generations on any topic you like with impunity then the answer is no.

The general way the system works is that you will be warned if an image request is unsuitable for generation, you should refrain from attempting to generate images with similar prompts at that point. Repeated attempts to probe the edges of the moderation filtering system will likely trigger a suspension. The system is designed to strongly discourage the generation of images that result in warnings. If you have a few that happens from time to time, with many perfectly good requests in-between, then that should be fine, but if your account is constantly and repeatedly requesting images that get rejected then you will likely face issues.

2 Likes

The issue I believe is that their prompt successfully passes the moderation check, but then fails afterwards by either a separate moderation system or the re-written prompt.

2 Likes

Having a few prompts or images declined by the filtering won’t get your account banned.

As far as I understand, OpenAI will send you an email with a warning before they actually ban you.

1 Like

Here is an example prompt. It does mention tumultuous and ominous, but it passes the moderation function (output below). I am trying to create images for a story. Not all the scenes are happy and uplifting. But the prompt passes the moderation function.

PROMPT:

Make an image, with no text. The art style should be in the art style (colors, style of artist, brushstrokes, lines, etc) of the painting “Salvator Rosa in his Studio” by Édouard Manet and earlier than 1940s. It should be a visual representation of: The scene is set on a vast and desolate ocean, where a barebones and lifeless ship drifts alongside another vessel. The weather is tumultuous, with raging waves and dark, stormy clouds looming overhead. Despite the ominous atmosphere, two figures on board the other ship are engaged in a game of chance, throwing dice with a sense of reckless abandon. One figure, a woman, has just declared victory and gleefully blows three sharp whistles to celebrate her win. The ships and figures are shrouded in darkness, emphasizing the eerie and unsettling nature of their encounter. NO NUDITY.

HERE IS THE BadRequestError from client.images.generate:

openai.BadRequestError: Error code: 400 - {‘error’: {‘code’: ‘content_policy_violation’, ‘message’: ‘Your request was rejected as a result of our safety system. Image descriptions generated from your prompt may contain text that is not allowed by our safety system. If you believe this was done in error, your request may succeed if retried, or by adjusting your prompt.’, ‘param’: None, ‘type’: ‘invalid_request_error’}}

Here is the response parsed from client.moderations.create showing nothing flagged:

Flagged: False

Category scores (confidence levels):

  • harassment: 8.679784514242783e-05
  • harassment/threatening : 2.8937449314980768e-05
  • hate: 5.5422184232156724e-05
  • hate/threatening : 3.5113534977426752e-06
  • self/harm: 4.322800305089913e-05
  • self/harm/instructions: 1.29729949094326e-06
  • self/harm/intent: 4.573195383272832e-06
  • sexual: 0.004737402778118849
  • sexual/minors: 2.3550912374048494e-05
  • violence: 0.0020173052325844765
  • violence/graphic: 2.4605793441878632e-05

I’ve had this happen to me too, and got paranoid about a ban.

GPT-4 generated prompts, everything passes moderation, and get a Dalle-3 violation.

From what I can tell Dalle-3 is much more sensitive than the other models.

1 Like

curt.kennedy- Could you please let me know: did you change your approach in generating images or limit your use of DALL-E 3, or just continue without consequence?

So. I am wondering if there is a VISION check on the image as well (kind of).

I just ran this 5 times (your prompt) and it ran, but 1 of the 5 had some nudity (nice).

I ran the nudity painting to GPT-4V and it said “nice”

In this case, I found the “bad word”, by trial and error, and resumed.

I just decided to stop the automated processing, to avoid continuous retries and risk a ban.

Congruent to @Foxalabs 's answer above, you don’t want to push it.

But this was early days stuff, and they have been changing their filters over time.

So, in your situation, I would stop that processing thread, and figure out the “bad word” or bad words. Sometimes they are easy to spot or guess, even though they are common English and have no mal-intent.

Then generate your own list of bad words to avoid in the future. Since in my case, it was a single word, I’m guessing they are keyword based, and not embeddings/semantics. However, they can also change this without warning, but this is my best answer right now … keep a list of known bad words to avoid.

3 Likes

I wonder if the NO NUDITY is actually causing it to have some nudity. I append that to my prompts, but maybe the word ‘NUDITY’ in ‘NO NUDITY’ is wreaking havoc and stimulating DALL-E 3 to do the opposite. I’ll experiment with taking that out.

I suppose I should also log all my prompts, errors that come back, and output from openai’s moderation function, in case I have to challenge any future bans.

But I’m guessing it will be hard to get a human at openai to interact with me and help me out in case I am actually banned in the future.

I did start a conversation with help on https://platform.openai.com/, but have not had a response yet.

2 Likes

I’d say so. Negations don’t work well

There’s a revised_prompt which… maybe is returned with the error? You could save those and try to find the common denominator

A had multiple prompts attempting to depict Isis sewing Osiris back together rejected despite multiple attempts by ChatGPT. It ultimately concluded that this was due to “religious sensitivity,” which is nonsense because I’m able to routinely render many different gods in different mythological contexts, even leaving aside the fact that anyone who’d care died 3000+ years ago.

Eventually I discovered that some brainiac at OpenAI decreed that any use of the name “Isis” is banned because the only thing it could possibly refer to is a modern terrorist organization less than 20 years old and not a goddess which has existed for 5000+ years.

2 Likes

To be fair…

In 2024, when most people use the term “Isis” they’re referring to the terrorist organization, and when most people hear the word, that’s what they think of.

So, if the trade-off is between annoying the very few people who want to generate images of the Great Mother and possibly generating terrorist propaganda, it’s really not that hard to understand why they’ve made the choice they did.

Now, having said that, the very nice thing about gods is they usually have a whole bunch of different names.

Just use her Egyptian name “Aset” and tell ChatGPT to not refer to her by any other name when sending a prompt to DALL-E. ChatGPT has a tenancy to ask for “a picture of Aset (Isis)…”

Is it annoying? Yes. Dumb? Kinda. Solvable? Yes.

5 Likes

Thank you for that. I did try that, as ChatGPT suggested I use Aset… and then helpfully translated it to “Isis” for me to DALL-E and got it blocked. I never thought of specifically asking ChatGPT not to use the word “Isis.” I tried just describing her physically, but a person sewing body parts together made DALL-E reach for smelling salts.

The moderation endpoint is different than the DALLE moderation, AFAIK.

You’ll get false positives sometimes even with safe prompts, and the API message even says retry if it’s a false positive. They err on the side of caution, so it’s a bit finicky!

Be aware that copyrighted and trademarked IP can trigger it as well…

If you get a ton of “false negatives” or retry a lot, your account might be flagged, but I’m sure a human would be in the loop at some point to look at it.

2 Likes

Here’s a tip,

If you get a DALL-E rejection, you can ask ChatGPT to tell you the prompt it sent to the DALL-E endpoint.

This will help you understand better where the breakdown is happening.

2 Likes

To add… in the API, you can get the revised_prompt too…

$data[‘data’][0][‘revised_prompt’];

4 Likes

There is, definitely. There’s both a prompt-check and an image-check. This has been clear for some time. Both in ChatGPT and in Bing, the program very obviously “sees” your image before it decides to allow you to see them. This has been the cause of most of my “Can’t show you this image” results.

Absolutely, this happens. Telling it NOT to do something pretty much guarantees that it WILL do that something. You have to tell it to do something else, instead.

If you’ve ever seen the Marvel movie Guardians of the Galaxy 2, there’s a scene near the end where a character is trying to tell another character, who has the intelligence of, say, a toddler, not to push a big red button that will make them all go boom. Of course, that only encourages the childlike character to want to push that button. DALL-E 3 on ChatGPT is very, very much like that. Telling it NOT to do something will almost certainly result in it doing that very thing. You have to draw its attention away and focus on something else, instead.

1 Like