Image fine tuning, false positive content policy violation

My images got rejected all the time for breaking policy but they are clearly not! They are just some images of industry settings. We have black and white camera and RGB camera. @willhang

1 Like


Now I am thinking it could due to some mistakes in error messages, I was getting rejected by policy, now I get something like “rejected by policy, inaccessible, or too large” so I think it could be something wrong with my link. I included the URL from my server to download the image and my server’s firewall might blocked the request. I will try using base64 and update.

Update: I used base64 and it is still not working. From the image I do not see any problems.

Thanks for sharing this @wguo6358 and sorry that you’re running into this! We’re working on making our error messages more specific, as we kept them intentionally vague at first to prevent abuse.

May I get your fine tuning job ID so I can look into exactly why your images were moderated? Sometimes, our content moderation systems can incorrectly block images.

2 Likes

ftjob-voSUy0Na9Et74q2n6Ob0VGC7

Hmm, so I found your file, and I can tell you which indices in your training file you should look into, but do you want to talk about that here or in a DM/over email? I’m okay with either, but just want to be mindful of your privacy. I can post the example indices here in this forum if you’re okay with that.

Yeah you can post it here

Hey Will. It was great to meet you at DevDay. I’m having a similar problem trying to label images of AI generated characters. They’re cartoons but I think the moderation model can’t tell the difference.

1 Like

My false positive rate is roughly 100 out of 22,000.

So about 0.5%. Not a big deal, but enough to be noticeable.

Also, looking at the error message, it has wording stating it could be a read error or timeout. So not sure what their retry policy is, but there is some ambiguity between “violation” and “unavailable” from what I can see in the logs.

For my use case the false positives rate is almost 80%. Most of my images are mis flagged.

1 Like

Yeah that is crazy high. Not the normal rate whatsoever. :-1:

You might try hosting your images on S3, or whatever durable provider, if using URL’s, and shrinking the image sizes. Just to rule out the content violation stuff, because it could be a timeout thing …

To explain, when it verifies the JSONL it is actually pulling down the file in your image, for each line, and running it through moderation. So any hiccups in this network of getting the image will result in that JSONL line being rejected, and you get the ambiguous message about a violation/unavailable image.

Like I said above, there is ambiguity in timeouts/unavailability and actual violation (the logs don’t distinguish the two right now).

I just relized something suddenly, they might get flagged because there is accidentally a human face in it.

Are we sure faces are not allowed?

Looks like it.
https://platform.openai.com/docs/guides/fine-tuning/content-moderation-policy

1 Like

Ahh, good catch. :+1:

No faces :thinking:

What if I wanted to create a happy or sad classifier?

1 Like

happy sad classifier, that was my first computer vision project lol. Yeah, seems like they do not allow that

It seems that your AI-generated cartoon images may be getting flagged because they potentially resemble human faces or people, even though they’re cartoons. To avoid violating the policy, here’s how you could navigate the situation:

Steps to Address the Issue:

  1. Strictly Avoid Real Faces or People-Like Features:
  • Ensure that the AI-generated cartoon characters don’t closely mimic real human faces or people. Even if they’re cartoons, features like realistic proportions or face details could be interpreted as human-like by the moderation model.
  1. Examine Dataset for Compliance:
  • Carefully review your dataset to make sure it doesn’t include any images that could be misinterpreted as containing real people or faces. If needed, filter out images that are close to this borderline.
  1. Alter Character Designs:
  • Adjust the design of AI-generated characters to have exaggerated or distinctly non-human features, making it easier for the moderation system to recognize them as fictional.

Key Considerations:

  • No Human Faces: Make sure your characters don’t have realistic human faces or features.
  • No People: Avoid generating images that closely resemble real people or individuals.
  • No CAPTCHAs: Ensure the images aren’t trying to bypass or replicate security challenges like CAPTCHAs.

Yes. The moderation model is prone to false positives like illustrations of faces.

2 Likes

Got it, so for file file-yB2q11rTe7qxtMZitJj1H07l, what do you see in 0-indices 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, and 15?

Consolidating replies here:

@AndrewMayne Great to see you here! Yeah unfortunately we do have to moderate those images too because those count as people. Our moderation policy is quite strict because we care a lot about the safety of our models. You could enable some pretty problematic use cases even with cartoon representations of people.

@curt.kennedy Really appreciate the feedback! I’m merging code that will make it clearer to developers + you all why examples were moderated out. Unfortunately we can’t say which examples were skipped exactly unless they’re outright in violation and block the file entirely, but we can at least give you some reasons. You’ll see the updates soon. And sadly we can’t enable use cases that involve fine-tuning on images of humans unless you’re at a high enough trust tier.

@wguo6358 Sorry to hear about the high flagging rate. Indeed faces are not allowed.

4 Likes

I think I know why: there are accidently human faces in it.