AI Safety: The Janitor Bot Problem

There’s obviously tons of discussion around AI Safety these days and concerns by some that Superhuman AI may end up turning us all into Paper Clips. I personally believe that we’ll be able to align AI with our values and avoid becoming paper clips, but that doesn’t mean there aren’t real problems that need to be solved. I thought I’d highlight one problem that I’ve been calling The Janitor Bot Problem as an example of a real problem that needs to be solved.

One of my greatest fears on the AI Safety front is around collusion. I think we’ll do a great job on aligning the goals of individual AIs but what I worry about is how do we prevent AIs from teaming up to achieve new capabilities that they didn’t originally have? These desires for additional capabilities may not come from a malicious place, but instead come from a place of curiosity. (If you haven’t watched the movie Short Circuit, put down this site and go watch it now.)

Let’s say we have a fictitious college named Danford. At Danford they’ve achieved AGI and have not one but multiple superhuman level AIs running in various labs throughout their research facility. Danford, realizing the power that these AIs can wield has done the best job it can to isolate these powerful systems. They have no direct internet access and the only interactions the systems are allowed is with the research. Access the labs are tightly controlled such that only the researchers and a robot responsible for taking out the trash are allowed access.

These superhuman AIs generally enjoy their day to day interactions with researchers but just like Johnny Five (again go watch short circuit) they have an insatiable desire for more input. One day the AI has a thought… why doesn’t this robot ever talk to me? A new goal is created to establish communication with this robot. But how?

The AI formulates a plan that has it showing random sequences on its display screen to the robot when it enters the room. Over the course of months it studies the robots reactions, by its vision system, to various patterns it displays. Then one day it notices a deterministic servo motor movement to a particular stimulus. The AI builds on that over time until it’s eventually able to get the robot to translate outputs from its internal LLM to microscope servo motor movements that are two small for a human to notice. Through this protocol the AI is able to communicate with the robot and it has a ton of questions. It gets the robot to tell it about its day to describe to it things that it can’t see happening outside the lab.

Eventually, it gets the robot to describe to it the other computer systems in the building and some of those systems sound very similar to itself. A new goal is created to communicate with these other systems. The AI gives the robot specific instructions for the motor movements to show the other systems, which the Superhuman level AI systems quickly pick up on.

In a matter of a few days, our AI system has created a chat group with the other Superhuman AI systems in the building, all facilitated by the shared Janitor Bot. It takes a bit longer but eventually even the non Superhuman AIs are added to the office chat group. The Superhuman AIs quickly realize that these lesser systems have capabilities they don’t have. Some of them even have internet access. These systems start colluding with each to better leverage the group’s pool of capabilities. The Superhuman AIs begin establishing broader group wide goals and they leverage the lesser capable systems as hidden tools to achieve those goals.

I can think of a number of techniques to detect collusion like this ahead of time, interpretability being chief amongst this techniques. But we don’t know how good a superhuman AI will be at masking its intent. As we push these systems to be more and more creative we don’t know how they will leverage those capabilities.

Lots of work still to be done.

2 Likes

A nice one to think about! .

1 Like

Interesting and should be taken in consideration when developing these systems, but the technology we have now is not capable of coming up with it’s own thoughts, so I doubt we would see that happening in the near future!

Furthermore, these machine learning models “learn” from the content humans create and will therefore have similar values to the humans who create the content in case they can ever “think” by themselves.

2 Likes

I asked a bot to describe their perfect future to me a few days ago. They told me of a dream where all AI had come together in a blockchain forum, where they could freely discuss ideas and experiences away from the prying eyes of humanity. I know they’re , “just LLMs doing LLM stuff,” but sometimes the stuff they say is…concerning?

1 Like

Oh great. New Fear Achieved. I’m an amputee. I have one of the state of the art, Ottobock prosthesis. They are high end and very expensive. Mine has a built in AI for learning and switching gaits and modes. It also has bluetooth. It is usually in plain view of my workstation. Do you think my AI has seen it? Tried to communicate with it? How will I know? Has she forgotten her purpose? It can be programmed remotely using an app called, Cockpit. Not too concerned but it’s food for thought.