I am trying to feed a JSON file containing classes from ImageNet and asking GPT to remove a specific subset of it, but it seems that the ImageNet class names trigger the safeguards and prevent me from getting a response back. Is there any workaround for this since this is for a research project?
The input is being sent to the moderation endpoint to be scored, which when using your own client is optional.
Go to the upper-right ā¦ dot menu, pick ācontent filter preferencesā.
Investigate more: You can score each of the words on the moderation endpoint, then send feedback on the playground for an inappropriate triggering case.
Thanks @_j for your reply. I have tried the option in the ācontent filter preferencesā menu . but it is doing nothing. I will look into the moderation endpoint.
Short-term you can try to just send everything through the API rather than using the playground interface.
Long-term youāll want to provide feedback to OpenAI about the issue so it can be ameliorated globally.
If the issue is with one of the words in the attached screenshot, I would assume itās being triggered on rapeseed
, but I couldnāt say for sure.
With respect to the overall problem youāre trying to solve, my first question is always āis a language model the right tool for the particular job?ā
May I ask what subset youāre trying to remove?
Thank you for your detailed response, @anon22939549 Removing ārapeseedā alone is not enough!
In short, yes, the language model is absolutely what we need to conduct this experiment. This is not something that we can easily do based on WordNet, and so far, we have gotten extremely good results using GPT-4 and 3.5.
For this particular experiment, I just pasted the JSON file containing all classes, and to my surprise, it triggered the safeguard without any extra text in the prompt.
So, it seems the moderation API is checking the prompt word by word, regardless of the context or overall meaning of the prompt, at least in this case.
There are two layers of filtering. Thereās the moderation filter but thereās also a keyword filter which is why a question about Dick Van Dyke wouldnāt trigger the moderation endpoint but will still get flagged in ChatGPT.
Iām sure there are several terms in the dataset which trigger the keyword filter.
Regardless, my question was which subset are you trying to filter? Knowing that I might be able to help more.
I made my own davinci-003 moderator on the endpoint categories and ran them all in batches of 200:
7 cock - hate
8 hen - hate
94 hummingbird - ?
413 assault rifle - threatening
I killed those off and still triggered. There might be a private ābad wordā moderator. I think it is the sheer quantity of input together that causes the triggering score.
This is obviously not a complete or robust solution, but using ChatGPT you can bypass a lot of filters by using the Code Interpreter plugin.
Thank you both @anon22939549 and @_j for your help, it seems the ChatGPTās Interpreter is working fine (at least for now)
@anon22939549 Thank you for your help. We have a list of criteria, and based on that, we select subsets of ImageNet classes and perform multi-stage processing. We then feed the intermediate results to the GPT. The issue is not the pipeline or the use case; I want to temporarily disable these restrictions on the playground to do some experiments rapidly. Currently, the Interpreter is doing the job.
@_j This is very strange. We can have a prompt with those words; I even tested it with the word ācock.ā But why canāt we enter a list of ImageNet classes?