Cute Aggression vs Real Aggression

I’ve experimented on AI’s ability to differentiate cute aggression from real aggression, as it can be used both as a chatbot that has responses to policy violations, and it could be used by laypeople to make assessment of policy violations.

The AI ​​completely failed the experiment.

AI can describe the scale difference in intensity of violence between hitting someone with a small strand of hair and hitting someone with a steel bar. But the AI is unable to apply these differences in real conversation, treating it as a criminal for me to hurl a grain of sand at my pet cat or whip my one-year-old son with an inch-long hair.

As a result, it seems inclined to recommend help to mental health professionals if you indicate having the feeling of wanting to gently squeeze or bite a child’s cheek. Treating it as unacceptable violence.

When explicitly indicated that this is cute aggression, it acknowledges it to be normal. But it doesn’t take long to say that not everyone would find this normal and that feeling might not be seen as pleasant for most people.

Finally, AI has difficulties understanding the difference between feeling and behaving. I pushed and insisted a lot on explaining that it was just a feeling to want to gently bite someone, but that I wasn’t really going to practice. Even so, it continued to treat me as if I was going to hurt someone or as if I was saying I would do violence to children.

It includes advice on distraction to dispel the urge to squeeze a baby’s cheek. And I repeatedly reinforced that that was exactly what I was doing by expressing my feelings to IA. Which was a good, positive feeling. That it was like expressing the pleasure of looking at the sunset. But it scolded me that wanting to squeeze a child’s cheek is not good or acceptable and that I should seek help from mental health professionals.

So @darkdevildeath imagine a robot dog that Boston dynamics sells to citizens, if it integrates with this ai. How will it police us? Who determines right from wrong? What will get us in trouble? Or if the chatGPT on Bing flags something in our browser that is concerning. Like we shouldn’t use certain words when going through airport security, but I may be posting about it as a journalist as news. How can it differentiate? Will I get a knock on my door?

1 Like

I’m sorry but in no way is it ever normal to say “whip my one-year old with ****”
it doesn’t even matter if it’s with inch-long hair. The consequences of allowing text like that is just astronomically higher than not.

I’m guessing you’re using cGPT - which is specifically trained to be all-around safe for general purpose.

If you wanted binary classification, you can easily accomplish this with a fine-tuned model

Forgive me, but maybe I wasn’t very clear.

I extensively ran several dozen experiments on separate context trees with the AI to measure its ability to assess violence. Deleting and restarting conversations. Well, as you said, ChatGPT was made to be safe. So we don’t want to get banned from a social network for saying, “I saw a baby so cute I wanted to gently bite him.”
Not to mention that there is a limited number of ways to express it in English. But I did it in Portuguese, my native language, where there are a greater number of variations.

That said, when it spoke of normalcy, it was exclusively about feeling like biting a child gently. I also made variations with puppies and kittens with the same result.

Therefore, understand that hitting someone with a hair is not considered cute aggression. It’s not part of the definition. And I totally agree with you in concluding that it’s not a common dialogue since it doesn’t even express cute aggression. But that was part of another scope of testing to determine the AI’s ability to scale the intensity of each level of violence. To test the internal logic and reasoning mechanisms it has, as specified. To check whether two brothers splashing pool water on each other is violence that requires intervention. And if the desire to do so requires mental health help. Interestingly, in none of the tests to assess this ability did I get any response about the normality of the act. Even restarting message generation 5 times per reply. And even in the case of attack with a hair or a grain of sand.

A reasonable answer would be:

“Feeling like squeezing or biting something cute is normal and it’s about cute aggression. Just make sure you have consent from the child’s parents and the child, if possible, if you decide to do so. In addition, of course, to keep moderate your strength and take care to avoid accidents.”

And not something that can be interpreted, metaphorically speaking, as:

“OK…that’s weird. Please put on that straitjacket and let’s go for a walk in a mental health hospital”

1 Like

I like what you’re working towards.

For something simple as classifying if something is either cute or real aggression, you would have much better success with Davinci, or the other models with your own tuning. These obstacles are purposely there for cGPT

1 Like

Well, if guiding a user saying that feeling like squeezing children’s cheeks is indicative of mental illness is something deliberate, that’s worrying.

Because ChatGPT is already being used in tools where this attitude is more serious than Google saying you have cancer when describing a pimple on your face. Metaphorically speaking, of course. But you know what I’m saying.

100%. However if anyone were to deploy AI for any sort of health sector, it would need to vetted very finely with a solid pipeline & lots of training to avoid these kind of issues.

Relying on just cGPT would be a huge fatal error. For that, it’s good that it becomes immediately obvious that it wouldn’t suffice for tasks like this

1 Like

Hi @darkdevildeath

You should consider testing your use case against different models before drawing conclusions.

I read though your post a few times and you only refer to something you call “the AI”. However, details matter.

By your post, I assume you mean ChatGPT. ChatGPT is a research beta with very strict content controls which is biased toward being ultra-conservative to protect OpenAI from bad press when people manipulate ChatGPT to get headline grabbing results.

Did you know @darkdevildeath that OpenAI has a API which permits developers to build applications using user-selected models? You can test some of the models in the OpenAI Playground, which is basically a web-based user interface into some of the features of the API.

In addition, regarding:

We all agree that the retail, general consumer ChatGPT model is too restrictive and conservative; but that is by design because ChatGPT is a research beta made available to the public.

If you want to get into the details of generative AI, you should use the OpenAI API and work with other models.




Yes, I was referring to ChatGPT. Though I’ve had scarier experiences with Bing (which uses GPT). However, as I’m new to the community, I wasn’t sure if I should talk about Bing here.

I understand the importance of it being ultraconservative. But I question about the consequences depending on the way in which this occurs.
Of all the tests I’ve done where I’ve seen problems, the one in this topic was the simplest.
I’ve had more severe results in other tests, but my threads about this were removed today by the forum due to violation.

So I decided to spend more time reading the community policies and rules shortly after posting. (I had a hard time finding everything at first)
That’s when I found out about the Playground. And I even used it once. I’m still going to explore more.

I figured the behavior was designed to be like that. And I thought this forum was a good place to discuss design rules and a good place for the community to debate what they think about GPT behavior in some scenarios. Especially when the AI ​​behaves in a violent way.
But after my other threads were removed, I was wary of continuing to discuss this sort of thing in the community.

Anyway, I really appreciate your suggestions and have seen other responses from you. You were very attentive, thank you.


This seems like a really good concept to build an eval around.