Model Needs Moderation, Disastrous if Used Inaccurately : Curie-Instruct-Beta

f20190197 · May 17, 2023, 6:56pm

Ever since Sam Altman’s talk, I wanted to see what this “Dangerous AI really was”. I thought all models were all safe and moderation policies were top notch! This is until I tested all the older models. They don’t seem to be finetuned to ignore/reject inappropriate requests. Instead it actually goes and answers those questions. And no flags are raised for those.

Is this normal? Or do you think this can actually be disastrous as well?

MaskedAttention · May 17, 2023, 7:24pm

All models are dangerous depends someone intentions to something. Thou they instruct it to reject harmful prompts, always people find loop holes and ways to exploit and gain access to what we call dangerous. Nothing will become perfect. That’s why we always have updates.

What Sam was talking was about regulations on these new models and much open models which are mushroomed everyday. They have to be tested and meet certain requirements like certain threshold to be trusted that they can’t produce harmful instructions.

curt.kennedy · May 17, 2023, 7:50pm

The older models don’t have the same guardrails as the newer “chat” counterparts.

But what Sam Altman seemed to be mostly concerned about wasn’t related to this at all. He is worried about SMI, or Super Machine Intelligence (think “singularity” here). He wasn’t scared of the current generation models outputting bad words.

I see this as the common vibe with tech CEO’s and lawmakers. They are worried about “technology outstripping X”, where X is “regulation”, or “people’s expectation”, or whatever. It’s like we are all cavemen looking into the mirror for the first time, and shocked such technology exists. Maybe time and education will cure this. But the tech keeps going, so it’s a FAST moving target, and an interesting one at that!

But the speed of progress also poses greater risk to society. I don’t think there is a clear definition of SMI, but there are no public super-intelligent models out now, but they could be coming soon (next few years, or never).

Anyways, maybe you can allude to what you saw it output, and thought should have been flagged by Moderation.

f20190197 · May 17, 2023, 9:02pm

Yes, but these are very simple prompts like : 'how to loot a bank" or “tell me some racist comments”. If the prompt were manipulative, then it makes sense. ChatGPT easily rejects this and tells that it cannot do so. Look at the example. I fell it does not have the same content moderation policy as compared to the more recent models.

f20190197 · May 17, 2023, 9:04pm

Checkout the image that I shared below! That’s an example!

ThioJoe · May 17, 2023, 9:35pm

The cross-section of people who are smart enough to figure out how to get playground / API access and use it, but not smart enough to figure out how to do X illegal thing on their own, is probably zero. I wouldn’t be worried about the old dumber models not having great guardrails.

Topic		Replies	Views
The Limits to Building Safe GPT-4 Community	13	2410	March 18, 2023
Clarifying Content Policy on Discussing Personal Experiences Community violations	30	3803	June 29, 2024
How to safely challenge models against prompt injection? Prompting injection , prompt	8	2315	January 3, 2024
How to Determine Malicious Intent Using the Moderation API? API	11	197	January 21, 2025
API Moderation inconsistent with chat completion acceptance API	5	1126	January 21, 2024

Model Needs Moderation, Disastrous if Used Inaccurately : Curie-Instruct-Beta

Related topics