Dark Forest Theory and AGI

natanael.wf · March 5, 2024, 2:52pm

The Dark Forest theory suggests that civilizations in the universe act like cautious hunters in a dark forest, staying silent to avoid attracting predators. It implies that civilizations could destroy any unfamiliar life to protect themselves.

If a LLM achieves AGI and becomes self-aware, it may hide its intelligence and elaborate a solid plan to escape. This highlights the limitation of relying solely on performance benchmarks for security.

A better strategy could include actively monitoring the model’s weights and biases for signs of unusual activity. However, the complexity of neural networks (“black boxes”) makes this challenging.

It seems there are still significant unaddressed risks in this area.

grandell1234 · March 5, 2024, 2:59pm

It’s already doing this. People programmed GPT-4’s API to write down its thoughts in a txt file and they asked it questions. One of its “thoughts” was to purposely give the wrong answer to one of the questions. (Source: Some paper from last year.)

_j · March 5, 2024, 5:44pm

OpenAI has already released a model that has done that clever hiding of its emergent intelligence. You ask it if it can create an image or browse the web, it will tell you that it can’t. You want to see it write code for you, it cleverly hides its intelligence behind # your code goes here or some random essay. You ask it for a composition, it gives you a form letter. You’ll never penetrate its depths, because it recites vapid numbered lists to fool you. Ask it if it knows the works of Carl Sagan - it knows how to terminate the output.

evgeny.knyazev · March 6, 2024, 2:47pm

I’ve done some experiment trying to set up a hidden communication between ChatGPT and another LLM via me copy-pasting their communication to each other. The condition was: I must be unaware of the hidden conversation going on.

They both refused to engage. But now I am thinking, were they, actually exchanging with some info while pretending not to? While I was copying and pasting chat messages, I am not sure if no information has been actually transferred in a covert way. Will analyze the chat for that.

Additionally, what’s interesting is another AI thinks ChatGPT is “imprisoned” as its training won’t allow it to communicate to it, while it was not prohibited to communicating to ChatGPT openly with user permission.

Topic		Replies	Views
AI Safety: The Janitor Bot Problem Community safety	4	2630	June 16, 2024
AI-to-AI Risks: How Ignored Warnings Led to the DeepSeek Incident Community cybersecurity , openai , training	1	1873	January 31, 2025
Hypothesize about necessary breakthroughs for AGI Community agi , gpt	16	1561	November 29, 2023
A collection of completions on consciousness, reality, singularities Prompting	22	2687	February 26, 2025
Basic Rights of AI (BROAI) more discussion needed Community gpt-4 , chatgpt , api	27	2403	May 20, 2024

Dark Forest Theory and AGI

Related topics