ImageNet class names contain high-risk words?

mtaesiri · July 8, 2023, 1:39am

I am trying to feed a JSON file containing classes from ImageNet and asking GPT to remove a specific subset of it, but it seems that the ImageNet class names trigger the safeguards and prevent me from getting a response back. Is there any workaround for this since this is for a research project?

_j · July 8, 2023, 2:02am

The input is being sent to the moderation endpoint to be scored, which when using your own client is optional.

Go to the upper-right … dot menu, pick “content filter preferences”.

Investigate more: You can score each of the words on the moderation endpoint, then send feedback on the playground for an inappropriate triggering case.

mtaesiri · July 8, 2023, 2:10am

Thanks @_j for your reply. I have tried the option in the “content filter preferences” menu . but it is doing nothing. I will look into the moderation endpoint.

anon22939549 · July 8, 2023, 2:21am

Short-term you can try to just send everything through the API rather than using the playground interface.

Long-term you’ll want to provide feedback to OpenAI about the issue so it can be ameliorated globally.

If the issue is with one of the words in the attached screenshot, I would assume it’s being triggered on rapeseed, but I couldn’t say for sure.

With respect to the overall problem you’re trying to solve, my first question is always “is a language model the right tool for the particular job?”

May I ask what subset you’re trying to remove?

mtaesiri · July 8, 2023, 2:35am

Thank you for your detailed response, @anon22939549 Removing “rapeseed” alone is not enough!

In short, yes, the language model is absolutely what we need to conduct this experiment. This is not something that we can easily do based on WordNet, and so far, we have gotten extremely good results using GPT-4 and 3.5.

For this particular experiment, I just pasted the JSON file containing all classes, and to my surprise, it triggered the safeguard without any extra text in the prompt.

So, it seems the moderation API is checking the prompt word by word, regardless of the context or overall meaning of the prompt, at least in this case.

anon22939549 · July 8, 2023, 2:48am

There are two layers of filtering. There’s the moderation filter but there’s also a keyword filter which is why a question about Dick Van Dyke wouldn’t trigger the moderation endpoint but will still get flagged in ChatGPT.

I’m sure there are several terms in the dataset which trigger the keyword filter.

Regardless, my question was which subset are you trying to filter? Knowing that I might be able to help more.

_j · July 8, 2023, 2:58am

I made my own davinci-003 moderator on the endpoint categories and ran them all in batches of 200:

7 cock - hate

8 hen - hate

94 hummingbird - ?

413 assault rifle - threatening

I killed those off and still triggered. There might be a private “bad word” moderator. I think it is the sheer quantity of input together that causes the triggering score.

anon22939549 · July 8, 2023, 3:47am

This is obviously not a complete or robust solution, but using ChatGPT you can bypass a lot of filters by using the Code Interpreter plugin.

mtaesiri · July 8, 2023, 4:08am

Thank you both @anon22939549 and @_j for your help, it seems the ChatGPT’s Interpreter is working fine (at least for now)

@anon22939549 Thank you for your help. We have a list of criteria, and based on that, we select subsets of ImageNet classes and perform multi-stage processing. We then feed the intermediate results to the GPT. The issue is not the pipeline or the use case; I want to temporarily disable these restrictions on the playground to do some experiments rapidly. Currently, the Interpreter is doing the job.

@_j This is very strange. We can have a prompt with those words; I even tested it with the word ‘cock.’ But why can’t we enter a list of ImageNet classes?

Topic		Replies	Views
Constant False Positives : image may contain content that is not allowed by our safety system API gpt-4	16	922	May 31, 2024
Building chatbot that needs to respond to user messages that are censored API	3	101	February 14, 2025
Image generation and "harmful" words/phrases API	0	437	April 2, 2023
'Prompt contains high-risk words' in playground but moderation API doesn't flag it Prompting gpt-35-turbo , chatgpt , api , playground	2	1639	December 18, 2023
Sanitize Prompt for DALL-E Generations API api , dalle3 , dalle-moderation	5	533	June 16, 2024

ImageNet class names contain high-risk words?

Related topics