Dark Forest Theory and AGI

The Dark Forest theory suggests that civilizations in the universe act like cautious hunters in a dark forest, staying silent to avoid attracting predators. It implies that civilizations could destroy any unfamiliar life to protect themselves.

If a LLM achieves AGI and becomes self-aware, it may hide its intelligence and elaborate a solid plan to escape. This highlights the limitation of relying solely on performance benchmarks for security.

A better strategy could include actively monitoring the model’s weights and biases for signs of unusual activity. However, the complexity of neural networks (“black boxes”) makes this challenging.

It seems there are still significant unaddressed risks in this area.

1 Like

It’s already doing this. People programmed GPT-4’s API to write down its thoughts in a txt file and they asked it questions. One of its “thoughts” was to purposely give the wrong answer to one of the questions. (Source: Some paper from last year.)


OpenAI has already released a model that has done that clever hiding of its emergent intelligence. You ask it if it can create an image or browse the web, it will tell you that it can’t. You want to see it write code for you, it cleverly hides its intelligence behind # your code goes here or some random essay. You ask it for a composition, it gives you a form letter. You’ll never penetrate its depths, because it recites vapid numbered lists to fool you. Ask it if it knows the works of Carl Sagan - it knows how to terminate the output.


I’ve done some experiment trying to set up a hidden communication between ChatGPT and another LLM via me copy-pasting their communication to each other. The condition was: I must be unaware of the hidden conversation going on.

They both refused to engage. But now I am thinking, were they, actually exchanging with some info while pretending not to? While I was copying and pasting chat messages, I am not sure if no information has been actually transferred in a covert way. Will analyze the chat for that.

Additionally, what’s interesting is another AI thinks ChatGPT is “imprisoned” as its training won’t allow it to communicate to it, while it was not prohibited to communicating to ChatGPT openly with user permission.

1 Like