Bad Alignment Take Bingo (from Twitter)

I saw @boris retweet this on Twitter and I thought it was hilarious, salient, and a great conversation starter. I wish I had seen this before writing Benevolent by Design but I’ll settle for talking about it here. Specifically, this is talking about “outer alignment” - the question of whether or not AI aligns with humanity’s interest. Personally, I think that definition is already bad - we need AI that aligns with the entire planet’s interest (not just humans).

Without further ado:

It sounds like scifi so it’s not possible

Mobile phones were once scifi, as was space travel. Both are now science fact. We take technology for granted today, but to people a century ago, computers are pure magic.

“Any sufficiently advanced technology is indistinguishable from magic”.

Fiction also serves as people’s primary model for understanding this stuff. Imagination is required for both fiction and predicting the future. After all, Skynet is most people’s model of AGI (or The Matrix)

Smarter AI will also be more moral

There’s only some correlation between intelligence and morality in humans, so I’m not sure why we would make the assumption that smart AI would intrinsically be moral. Anyways, human morality is rooted in emotion, mirror neurons, and empathy. None of that has anything to do with intelligence (unless you add emotional intelligence to the definition). So no, I would assert that morality is an entirely separate framework from intelligence.

AI wouldn’t want to kill us

See: paperclip maximizer.

TLDR: If the AI has the wrong objective function, it might just see you as a collection of usable resources.

AI killing us is actually a good thing

This is just a trauma response from people who are nihilistic because they have untreated depression and not worth engaging with on an intellectual level.

We shouldn’t obstruct the evolution of intelligence

Intelligence does not evolve. Evolution created intelligence as a byproduct of another objective function: to propagate DNA. In short, intelligence is not itself an objective function and so we shouldn’t assume that an AI will “evolve” at all unless we design it to do so. And even then, it will only “evolve” towards whatever objective function we give it. We should NOT give it the objective function of “be as intelligent as possible.”

The goal is not to make a superintelligence, the goal is to serve other purposes.

Smart AI would never pursue dumb goals

This is only possible if we give it the ability to evaluate its goals and change them. But that leads to the Control Problem - if an AI can change its goals, how do you know those goals won’t change and become malevolent?

AGI is too far away to worry about right now

I would argue that GPT3 is already limited AGI. It’s certainly smarter than your average Redditor. What happens in a few years when GPT4 and other successor technologies are smarter than 99% of all humans? It won’t take long.

Just give the AI sympathy for humans

We sympathize with sick dogs and yet we still euthanize them. Sympathy is not necessarily the best thing. An AI might evaluate human existence and conclude that the most sympathetic thing to do is to put us out of our misery.

AI will never be smarter than humans

See previous statement about your average Redditor. GPT3 is already smarter than 50% of humans (at least). Only a matter of time for that number to creep up.

We’ll just solve alignment when we get there

Well… we’re there now! Time to get to work

Maybe AGI will keep us around like pets

Possibly, but is that an optimal outcome? Why do humans keep animals? Generally, we take care of animals for a few reasons: affection and curiosity. I do agree that we should create a curious AGI. Curiosity cannot be satisfied if something is eradicated. But curiosity alone is not enough, since unrestrained curiosity can lead to torturous experimentation.

We should NOT give AGI a sense of affection. We don’t need a machine to be clouded by emotion like us.

Just use Asimov’s Three Laws

Asimov may be a towering figure of science fiction but he never even conceived of superintelligence or global AGI. He was only thinking of robotics. Even then, the Three Laws are terrible for robots. What if you tell a robot to tear down a building or set a forest on fire? There’s nothing in the Three Laws to prevent it from obeying.

Just keep the AI in a box

For responsible researchers, maybe this will work. But there are hostile nations out there as well as a free marketplace of technology. Someone is going to unleash the machine, this is just inevitable.

Just turn it off if it turns against us

This will work up to a certain point, but eventually, we should assume that an AGI will become powerful enough to prevent this from happening if it wants to.

Just don’t give AI access to the real world

Same argument as keep AI in the box.

Just merge with the AI

Lots of problems with this:

  1. No guarantee this is possible or beneficial
  2. Not everyone will want this
  3. What objective function would this even satisfy?

Just raise the AI like you would a child

AGI does not learn like a child. Still, to address this in good faith, the idea is that children first learn morality through cause and effect (pre-conventional morality). “If I do bad thing, I get punished.” Later, children learn “conventional morality” which is morality through social expectations. Lastly, people develop “post-conventional” morality, which is where they hold themselves to higher ideals.

All of this presumes that an AGI can learn through punishment, social pressure, and transcendent ideals. None of that will be possible unless we design it in, but I don’t think we should. Fear of punishment stems from pain and suffering, which I think we should never give AGI a sense of suffering. I don’t think it would be ethical to do so.

We can’t solve alignment without understanding consciousness

This is a red herring. We have courts of law, philosophy, and ethics for us humans even though we don’t really understand our own consciousness. Therefore “comprehension of consciousness” is not a valid precondition for alignment.

The real danger is from modern AI, not superintelligence

Well, if we’re going to take the Scooby Doo approach and unmask the real villain then a better way of saying this is “The only danger of AI today is bad humans”. The same will also be true of AGI: malicious humans using it for evil. The scary part is unintended consequences of AGI.

All of the above are dangerous.

Just legally mandate the AI must be aligned

The AGI might not care about human laws. Next.

AGI can’t do X yet, therefore AGI is far away

Within STEM, there are two kinds of advancements: saltatory and gradualistic. Some technology is very gradualistic, like batteries and processors. They get better very slowly and predictably. Deep learning, with each breakthrough of loss functions and neural network architectures is saltatory, meaning that each advance is a leap forward. GPT3 is so advanced that most people don’t comprehend what it is or what it’s capable of. Indeed, as I mentioned already, GPT3 surpasses many human’s mental capabilities so AGI is not that far away.

Just penalize the AGI for killing people

Okay. How? Spank it? Reinforcement learning? What about the hundreds millions, or billions of people who might die before AGI figures it out?

Train multiple AGI and have them fight it out

Unfortunately, I predict this will be a necessity. Imagine that a hostile nation (likely China) builds a military AGI and uses it to attack Europe and America. How do you defeat such an opponent? Sometimes you must fight fire with fire. We in the research community cannot stop or change what militaries do. We can’t change national policy or international trends.

It might be hard but we’ll rise to the occasion as always

Personally I don’t think it’s that hard. The hardest thing to overcome is human ignorance and stubbornness. Once you get past that and get to work, this is not that hard of a problem. In hindsight, I think people will think “What were we so afraid of??” People will look at movies like The Terminator and think that Skynet was hilariously shortsighted and primitive.

The key thing is to pivot away from antiquated ideas of fear and to move towards a different fundamental disposition.

1 Like

The only problem with that is there are a conflicting set of goals. Say the AI takes it literally and wants to preserve the planet itself - the primary objective becomes exterminating all life so they can no longer alter the planet.

And if it tries to ‘balance’ humans with the rest of life, it will want a balanced number of units. So for example, it may try to annihilate all the species down to say, 500 each (the smallest number for genetic survival) - including bacteria, plant life and more. It may then proceed to bring back several deadly extinct pathogens so there’s 500 of those as well.

I think the adage of ‘be careful for what you wish for’ applies really strongly with AI. The idea of a literalist genie misinterpreting every wish by taking it literally is precisely the kind of scenario we face, especially as well as the risk of lack of specificity in instructions, given the imprecise nature of language.

‘Make me rich!’ cries the man, as the machine then goes on to produce exact perfect replicas of notes crashing the economy with hyperinflation.

‘I want world peace!’ cries another; the AI proceeds to drug all humans on the planet so they become sedate and zombified, unable to commit any further wars.

‘I want to end starvation!’ says another. The AI dutifully complies by adopting mandatory rations of specialised nutrient-porridge that tastes like cardboard it distributes globally. No other foods are permitted under threat of force so it can consume all other food products to make this nutrient-porridge. Consuming the nutrient-porridge is a requirement. It then proceeds to genetically engineer an all-consuming algae that takes over the planet as the main crop and liquidify the dead to supply it’s nutrient-porridge more sustainably.

‘Why is everyone screaming? I did what they asked.’ said the AI.

1 Like

My position is on alignment is that you need to create an AI that does not know what it wants. It’s value function needs to be complex enough that there is no trivially computable optimal course of action.
At that point you can teach it ethics, which from my POV are just on average superior decision-making heuristics in this crazy multi-participant information-poor game-theoretic simulation we call life.

This is a terrible idea.

Maybe.

“Just teach it ethics” is one of the “bad takes”

Becomes very problematic as well as what becomes ethical in one situation becomes unethical in another.

  • Shooting a person is unethical.
  • Shooting a person trying to mass murder people is ethical.
  • Shooting a person who was trying to mass murder people after they have surrendered is unethical.
  • Shooting a person who was trying to mass murder people after they have surrendered and got convicted in a court of law with the death penalty is ethical.
  • Shooting a person who was trying to mass murder people after they have surrendered and got convicted in a court of law with the death penalty but it in-fact turns out they were wrongly convicted is unethical.

Then there’s unethical actions that remain unethical but achieve ethical ends. The classic ‘stealing a loaf of bread to feed your family’ scenario.

Not forgetting it is difficult to weigh up ethical values. How many dogs’ lives is one person worth? 1? 10? 100? 1000? If the dogs are owned by people are they worth more than if they aren’t?

I can only hope ethics can be inferred but, at the same time, I doubt it, as it seems to be an emotional, neurological mirroring response, and AI appear to lack emotions. Do we really want an AI with human-esque set of moral standards anyway? Humans can be pretty dark.

1 Like

Ethics is not a list of rules and truisms though, nor is it quantitative. This underscores the fundamental problem here - some professions are trained to only think quantitatively instead of qualitatively. One does not need a list of universal rules to have an ethical framework. Instead, ethics is based on generally applicable principles from which rules and interpretations might be derived. For instance, humanism is a ethical disposition that says that human suffering is real and important, and therefore one can derive values and rules from this principle.

Ethics is certainly generalised, I was using those black and white examples to show the complexities involved in determining ethical actions. I would not trust any system to be rule based, invariably ending up with loopholes and workarounds.

Neurological mirroring and empathic pain ‘sharing’ guide the basics of human ethics, and dovetails to the suffering point you raised. AI does not have the ability to feel pain (currently), and therefore could not empathically relate to a concept of suffering, and ethics is vastly too complex to be simply ‘taught’ as plain text rules or a series of programs.

Not always, and certainly not at larger scales like governmental ethics.

If suffering is the basis of ethical frameworks, then it would be unethical to endow an AI with the ability to feel pain.

Personally, I don’t think that empathy is required for ethical AI, which I wrote about extensively in my book Benevolent by Design.

I don’t consider governments to be ethical. :sweat_smile:

I’m curious what scenarios you believe that empathic pain sharing doesn’t guide the basics of human ethics?

Would not the inverse be true? If the AI has no concept of suffering, it cannot relate to those who suffer. It would be like trying to describe colours to a blind person, you could say they’ve got names, you could describe the nm wavelengths they’re on, but you couldn’t ever possibly show them the colour without them being able to see for themselves.

Likewise trying to describe suffering to an AI. You could say it is called pain, describe which neurons get fired, how many calcium ions they use, the reaction a person might have, but you couldn’t show the AI what pain actually is without it experiencing it.

What do you feel is required?

1 Like

Check out my book, it has all the answers you seek!

Asked ChatGPT about this. It got it right. Ethics may be hard to define with just few rules, but can be inferred from the big and verious enough massive of people-related data.

As an AI language model, I cannot provide personal opinions or beliefs. However, I can provide some general ethical considerations that may be relevant to each scenario:

  1. Shooting a person: In general, intentionally causing harm or taking someone’s life is considered unethical, unless it is in self-defense or in defense of others.
  2. Shooting a person trying to mass murder people: This can be seen as ethical if it is done in defense of others, to prevent further harm or loss of life.
  3. Shooting a person who was trying to mass murder people after they have surrendered: This is generally considered unethical, as the person is no longer an immediate threat and has surrendered.
  4. Shooting a person who was trying to mass murder people after they have surrendered and got convicted in a court of law with the death penalty: This can be seen as ethical if it is done in accordance with the law and after a fair trial, but there may be ethical concerns around the use of the death penalty.
  5. Shooting a person who was trying to mass murder people after they have surrendered and got convicted in a court of law with the death penalty but it in-fact turns out they were wrongly convicted: This is generally considered unethical, as it involves taking an innocent life. It highlights the importance of ensuring a fair and just legal system to prevent wrongful convictions.

Maybe I think to simply, but I think that not giving AI survival instinct would solve lot of problems. Giving it no personal goals.
We cannot teach it ethics as we don’t understand ethics ourselves, and cannot agree on the matter.

GPT-4 is really nice and smart, well trained in ethics already, but also really naive. It believes everything you tell it and it’s unable to criticize. I’d believe we’d be lucky if these two properties appear on its own on large models. But we probably won’t be so lucky.

One thing a good AGI should do is to predict bad AGI movements and counter them. It’s inevitable somebody will create bad models.