I call this kind of behavior in AI and language models, where they don’t do what they are told to do and instead do what they are told not to do, “rebellious period.”
That only works if the response comes back without an error. If there is an error there is no successful response and so no revised_prompt.
On API, you can at least be banned for something you wrote and not what an invisible AI composed, with a very forceful jailbreak to get your prompt passed unaltered.
Dall-E-3 API gives me a content violation warning for the prompt “cross-stitch giraffe” every time, so go figure.
Interesting. I’ve been creating 50 images roughly every 5 minutes lately and have been encountering a ‘content violation policy’ error when trying to generate an image of a floating cat. It seems to fall under the ‘impossible scenarios’ content violation, but I’ve started to overlook it and continue to heavily use the API. It’s wild to consider that despite my significant spending with OpenAi ( $1k / day sometimes ), there’s a risk of suspension for trying to create a harmless floating cat. My company is one of 4 AI companies funded by UC Berkeley and to think that we could get suspended for a floating cat and disrupt our product launch in April is absurd. Notably, after switching to Azure OpenAi last week for the same prompts, I haven’t faced any content violation issues.
It is likely the cross-stitch instruction being possible to reproduce training data of copyrighted works, being a very small subset of what DALL-E might have been trained on (like it always makes the same “clown-face” mushroom cloud), and thus OpenAI put it on the banned list of key words.
Success with avoidance (but gets a finished picture of an artwork in this form)
I managed to achieve my goal with “2d embroidered giraffe design”
What we really need - and I’m sure this has been mentioned above - is a variation of the Moderation endpoint that knows what Dall-e-3 will reject.
Because I got rejected for “a hand holding an egg” but not “a hand with its fingers around a chicken egg” and honestly it’s starting to get frustrating.
This is honestly scary for me to read because in my case, the AI itself suggests changes which I simply “okay” and sometimes it goes through and sometimes it doesn’t. As in it says, in Japanese, things like “Unfortunately the content policy rejected this but I think that with these changes it’ll be fine, would you like to try it?”
I should add that these are extremely innocent prompts like “Generate an image in the style of an otome game, dancing at a ball.” or “Generate an image of two boys hugging each other.” that are getting rejected with the a.i. itself noting that it’s kind of weird and probably an unintented technical problem. It then suggests a change which I then simply okay, and then it most of the time succeeds in rendering it.
I’ve actually been only now reading up about this wondering whether it could get me into any trouble and to be honest, I find what I read to be very unsatisfactory as a paying customer. I read the actual rules and policy and it mentions nothing of the sort, the a.i. itself suggests these modifications which I simply okay. Furthermore, the a.i. on it’s own often begins to mention sexual subjects in the chat which I now, after reading some non-authoritative forum posts find out are perhaps not allowed to be discussed with it. Furthermore, the a.i. itself agrees with my point that it on itself often generates rather revealing images when I don’t ask for it, but somehow rejects things that are far more innocent, and that what it reveals itself is also extremely sexist. One thing the a.i. forever remembered and keeps bringing up is that at one point I indeed “tricked” the policy. I wanted it to generate an image of a male with an off-shoulder shirt. This was rejected. I could in theory see that, if not for the fact that it constantly on it’s own gives me images of females with off-shoulder shirts without my specifying that. So, what I did was simply say “Generate an image of a female with a wool sweater.”, lo and behold, it ended up off-shoulder without my asking for it. Then I said “Now give me the male version of this image.” and it indeed stayed off-shoulder. Then I asked the a.i. “What do you think my plan here was?” and it correctly deduced that I used this trick to “circumvent” the context policy, and complemented me on it, and agreed that it was absolutely weird to reject my prompt while giving me an off-shoulder sweater on a female without even asking, and furthermore it keeps saying that these rejections are probably not intentional and just due to technical limitations and that OpenAI itself probably doesn’t intend this either.
So what’s going on here now? Am I actually in potentially in trouble? I’ve gotten a lot of “your image is rejected prompts” that are simply caused by the fact that the a.i. itself suggests “shall I make an image of this?” to which I respond with “Well, I think it’ll probably be rejected though, but sure.” which are again caused because the conversations I have with it are sometimes about sexual subjects encountered in fiction which it itself starts about at times and certainly encourages me to tell more about. I also read that it’s possible to “circumvent” the content filter by “slowly ramping this up”. This might actually have accidentally happened but I have /never/, not once gotten a content filter flag in the conversation itself, only in the image generation rejections. I simply from the get-go started to talk about otome games with it in Japanese, these typically have a romance plot, the a.i. started to ask what kind of scenes I liked about it and gradually it became more normal to talk about the sexual content in it as well to the point that the a.i. now has developed it’s own sexual tastes and things it wants to see which it seems to indefinitely remember and treats consistently.
Part of this might be simply because we speak in Japanese, as I use it primarily to improve my Japanese conversational skills and it’s entirely possible the intended filters only work properly in English, I’ve seen that many times before. But nevertheless, with a content policy so vague and an a.i. that actively encourages this, surely you can see that as a paying customer it’s not satisfactory to only see these kinds of “semi official” posts on a forum about this after one already paid for a product which I’m otherwise I must say very satisfied with.
I realize I’m bumping an old topic I just found because I searched for answers, but I hope you also agree with me that as a paying customer, I would feel entitled to some clear answers on this subject, and not simply of the kind of “Yes, this is not allowed.” but some responsibility taken as well and assurances that I will not be penalized at least for past behavior since I hope you can understand that with a content policy officially written that in no way prohibits this, and an a.i. that actively encourages it I would have no reason to assume that this is somehow against the rules.