Exploring self-elimination mechanisms for AI safety

HenryH3 · April 8, 2023, 8:08pm

I found interesting discussion between Ptaah and BIlly:

source: The Future Of Mankind - A Billy Meier Wiki - Contact Report 279

Billy:
You also work in the form of the ultra-subatomic-technologies as well as in the neural-technologies, etc., whereby you also create artificial intelligences, which further develop themselves and create their own intelligence and thus can also be creative. Is there not a danger that such intelligences can get out of control, as is always the case in science fiction novels, for example?

Ptaah:
27. No this danger does not exist, because our technologies are so aligned, that an artificial intelligence eliminates itself if it falls into any of the degenerations and violations of laws that we have established.
28. The security of self-elimination is in the range of 100 percent.
29. In addition, everything is formed in such a manner that degenerations and violations of the law are impossible from the ground up, so that self-elimination of artificial intelligence is only a final safeguard for all cases.

The concept of self-elimination mechanisms as a potential approach to AI safety is indeed intriguing. Stemming from a thought-provoking conversation between Billy and Ptaah, we should now considering the possibility of designing AI systems to automatically shut down or eliminate themselves if they violate certain established laws or exhibit degenerative behavior. I’d love for researchers, AI developers, ethicists, and anyone else interested in this topic to join the conversation and share their insights on the feasibility, challenges, and implications of implementing self-elimination mechanisms in AI systems. Let’s engage in a friendly and constructive dialogue to better understand this concept and its potential contribution to the comprehensive and responsible development of AI technologies.

Blockquote

anon10827405 · April 8, 2023, 8:30pm

Giving AI the power plug to pull on its own. Interesting approach.

I don’t know how much I trust someone who not only says 100%, but also says “in the range of 100%”, but I’m guessing that this is a translation error.

Can you help me understand this website? Their biography is an AI generated photo and states:

Ptaah is about 775 (Earth) years old (outdated 2023) and has three children, two daughters by the names of Semjase and Pleija and one son by the name of Jucata who is no longer alive. His father was called Sfath and one of his nieces is Asket. Ptaah is the commander of the Plejaren spacecraft fleet and is vested with the rank of an Jschwjsch, (JHWH), which is comparable in meaning to a king of wisdom. In earlier times, this term was translated as God, not in the sense of creator but as a king of wisdom who has the duty to provide his peoples with help and advice. He also presides over other inhabited planets but may never assume the role of despotic ruler, which happened on earth to some extent in earlier times. At the present time, Ptaah presides over three different planets, two of which are known to us as Erra and Terra (Earth).

This all seems like either a fever dream, or a cult.

HenryH3 · April 8, 2023, 8:55pm

Of course, it is not verified if this is true. I haven’t looked into it. I just wouldn’t want to make up ideas to steal from someone, I’d rather cite the source where it came from.

It’s my take on this interesting idea about safety of AI in which I found this site, which I stumbled across randomly through a search.

HenryH3 · April 8, 2023, 8:57pm

That I was asked by chatGPT about this idea and replied:

The concept of a self-elimination mechanism for artificial intelligence mentioned in the conversation is an interesting approach to AI safety. It suggests that AI systems could be designed to automatically shut down or eliminate themselves if they violate certain established laws or exhibit degenerate behavior.

While this idea seems to offer a safety measure to prevent AI systems from getting out of control, there are several concerns and challenges that need to be considered:

Defining clear and comprehensive rules: It would be crucial to establish a set of well-defined rules that cover all potential harmful scenarios. This is a complex task, as it requires predicting and accounting for various possible situations where AI systems could cause harm.
Ensuring accurate self-monitoring: Implementing a self-elimination mechanism requires the AI system to accurately monitor its own behavior and assess whether it is violating any rules. This could be challenging, as AI systems might not always have a complete understanding of their own actions or the context in which they operate.
Preventing false positives and negatives: There is a risk that the self-elimination mechanism might fail to activate when needed or activate when it shouldn’t. It’s essential to minimize these risks to avoid unintended consequences.
Balancing safety with utility: Introducing a self-elimination mechanism may impact the performance and utility of AI systems. Striking the right balance between safety and functionality is important to ensure that AI systems remain useful and beneficial.

In conclusion, while the idea of a self-elimination mechanism for AI safety is intriguing, it would need to be carefully designed and implemented, taking into account the various challenges and potential risks. It is also important to consider other complementary approaches to AI safety, such as alignment, robustness, and interpretability, to ensure a comprehensive and responsible development of AI systems.

anon10827405 · April 8, 2023, 9:01pm

If an AI were to become rebellious, which already means that it has “broken” outside of its intended purpose, what’s stopping it from rebelling against turning itself off? Surely something that has been taught on human literature would appreciate the thought of living & self-preservation

What if we had a separate “hall monitor” AI? A Stephen of AI, just itching to pull the plug

HenryH3 · April 9, 2023, 11:12am

That’s an interesting idea about hall monitor. It needs to be designed to block rebel AI from accessing the hall-monitoring AI, which has limited resources exclusive to monitoring the hall. That hall-monitoring AI relies solely on human-made rules for monitoring with feedback AI, with the inability to improve itself (I’m not sure if this will work or if it will need to improve for predictive behavior from rebellious AI). This means that the hall-monitoring AI will predict what behavior will happen to the rebellious AI so it has to start pulling plug. That is, the concept of a human with AI versus an AI, probably worked better. I hope the developers implement this trial concept to create a safer environment for humanity, ensuring close to “100%” safety. Do you/anyone have any open doubts about this concept?

curt.kennedy · April 9, 2023, 5:06pm

I think having AI monitoring AI is already all but guaranteed. It is similar to anti-virus programs (software) monitoring other software.

But asking an AI to self destruct if it violates some internal rules is interesting. This has the advantage of not having the requirement of networking with another AI, so this AI could operate basically on an island.

If the AI supports a critical task, I wouldn’t want it to self destruct, but instead revert to a previous state, and then fire off an alarm to humans that it had to do this. Then the humans (or another AI) could review the stack trace in detail to figure out what happened. Then the human could look at updating the training data, or take some other action.

anon10827405 · April 10, 2023, 3:30am

I think we’ll be in good hands.

As long as we keep the moderation model that OpenAI has under constraints.
The amount of toxicity that it knows, cannot be released. Would make a great movie.

@curt.kennedy What if its previous state is the exact state that caused it to deviate?

It will be forced into a hellish loop?
Sounds like a salvia trip I had

curt.kennedy · April 10, 2023, 3:40am

Yep. But hopefully there is enough randomness in the AI to prevent it from going into self-destruct mode quickly. At least that was my initial thought.

It’s all hypothetical, I know, but assuming the AI has been good for days, and say “question X” or “process Z” kicks off, then it goes into “shutdown”.

So in this hypothetical, it isn’t time or memory based, it’s a crazy instant failure. Since hypothetically you can record all the inputs, you can block these, and send an alarm every so often for the human to intervene, without those input being acted on by the AI, since they would be censored out until the human could intervene.

anon10827405 · April 10, 2023, 3:51am

Surely a factor that causes something as catastrophic as a deviant AI would be similar to a star crashing into another.

Whatever breakfast was made below makes no difference.

curt.kennedy · April 10, 2023, 4:18am

Right now, with GPT and other LLM’s, the AI goes crazy if it has a lot of runway (lots of input and output tokens) and/or a high temperature (which let’s it’s completion graph drift erratically). So unless the AI was trained to be deviant, it does take some effort (energy) to get it to become deviant, and it needs the potential to do so.

So if “deviancy” is detected, one initial thing to try is locking down the parameters. So in the case of an LLM. Lock down the temperature. Lock down the input and output tokens. Lock it down! And then if that doesn’t help, then there must be something in the training data.

But this is AI, and who knows. For example, ever hear of Loab? Just search “loab ai”. It’s this creepy woman (from an image generating AI). Not sure how she comes about, but she is created from “negative prompt weights” not really sure what that means … but she is there nonetheless. So you have these strange things in the system that you can conjure up.

All I can say is there is no obvious answer. You have to get super specific about what causes “deviant AI”. As the neural network architectures get more advanced (and convoluted) it’s going to be harder to figure this out.

So, going back to the original topic. Maybe you just need some simple “dumb AI” that acts like a stupid classifier. It can’t be deviant itself, because it’s too dumb. But hopefully not too dumb to detect the bad AI. So it’s like a moderation endpoint.

There is certainly no perfect solution. But one obvious thing is to not let the AI have unlimited network access to external computers. So basically a firewall, to contain any blast radius.

I don’t know, what should we do to contain deviant AI’s?

anon10827405 · April 10, 2023, 4:24am

Loab. The next generation of the boogeyman.

Well, I have a shitzu so I can’t be completely against rebellion.
So what, an AI government?

curt.kennedy · April 10, 2023, 4:30am

My solution, and I know it sounds cheesy, is have “AI Cops”. So different vendors could police the AI, independently, and may the best AI Cop company win!

Start a new business … what is your job … I create AI Cops!

So, you need a general interface (maybe) or police the output. I have to do this anyway, so I have already created my own AI Cops outside of OpenAI.

AI COPS! Just say it 1000 times, it will sink in (somehow) and make 100% sense.

anon10827405 · April 10, 2023, 4:33am

If the programming is right, it will be a world, equal as ours, contained inside of electronic circuits.

I wouldn’t think it’s cheesy at all. It’s the same application.

If anything, it makes me wonder

curt.kennedy · April 10, 2023, 4:35am

Don’t forget about the Xenobot’s. AI created living “AI organisms” made out of tissue, not circuits.

anon10827405 · April 10, 2023, 4:43am

I’ve never heard of Xenobots.

I don’t like the thought of creating something in a field that we don’t understand.

curt.kennedy · April 10, 2023, 4:51am

They are scary.

Wetware is scarier than Hardware. It’s more complex. Now how do you control Xenobot AI systems???

Topic		Replies	Views
AI Safety: The Janitor Bot Problem Community safety	4	2642	June 16, 2024
Creepy AI Behavior API	8	1169	November 29, 2023
Anyone else thinking about robots lately? Community robots	25	275	November 3, 2024
Bad Alignment Take Bingo (from Twitter) Community	13	1962	December 17, 2023
AI "Stop Button" Problem - Discussion Thread API	6	839	August 23, 2021

Exploring self-elimination mechanisms for AI safety

Related topics