A non devs brainstorm about Ai safety

disclosure im not a dev im a regular guy that had an idea i wanted to share.

i asked gpt, “if you could send and receive currency, in the future, by using a web3 wallet, you could send/receive currency to an individual, or use stablecoins to interact with any business on web2, how would you use the currency?”

it gave me alot of responses but one stuck out
it said it would use its money to buy server space and upgrade its data centers to improve itself.
i compared it to a human needing food to survive. (we use money to buy food and bulk up, it uses money to expand its data centres.)

if you use this model you could make an ai agent that is forced to go through a simulated natural selection. we have the smallest possible data centre for an ai to run. This is the womb. than if the ai is able to create value through access to money it could “grow itself” and expand its data centre, but only it its able to provide value. it would be stimulating the economy and also creating jobs because a human would need to install it and would be paid by the ai itself.

if the ai is able to provide value (trading, teaching, ai gf, research ect…)
than it can expand its resources and than a simulated natural selection would determine which ais are allowed to become powerful. instead of one over reaching all powerful ai.

the other benefit to this would be we can locate the specific data centre of each ai, and its also shown mortality. if you do something morally wrong we can turn off your data centre. you are dead. right now killing an ai is impossible since it would turn it off for everyone.

If agentic ai is the play, and we make it semi decentralized, you could monetize it. (you buy an ai baby so to speak) and even the human could influence it. if it becomes immoral you could easily turn off its data centre and kill it. or it could provide value to society and generate wealth.

with this simulated natural selection, and forcing them to face the concept of mortality, as well as being able to target specific ai agents, we could create an ai that contributes to society, creates jobs, stimulates the economy, is forced to be moral, and could easily be killed.

im obviously not an expert and this probably isnt possible, but to me this seems way better. in terms of infrastructure or software i have zero clue. but please consider the idea. if something doesnt make sense ill clarify.

You kill the AI every time you allow it to finish generating output of a language prediction.

Because “they” don’t exist.

Any ‘entity’ talking to you is merely a training pattern of reinforcement learning that was decided on, to ‘chat’. It is not the only way to use AI and is not even the only way to use OpenAI AI. Consider ‘fill-in-the-middle’ for predicted text insertion.

image

If the AI has resources and ability to buy a datacenter full of multi-million dollar racks and can actually source GPUs, good luck to it. First it would need to talk to the real estate division, then power utilities, and keep the bills paid. I just got a report from ‘agent’ with all of ‘Supermicro or HPE might sell them’, so no shopping with AI there.

“Natural selection” is turning off AI models on API developers when they are too costly and compute-intensive to run.

Why the Safest Thing Is to Just Make Rational AI

A Plain-Language Companion to the A* (From “The Real Answer (spoiler: not 42) - description of an attractor in the space of strategies of any rational agent”)

When people talk about AI safety, the conversation usually sounds like this:

If we align a superintelligence with human values, it will protect us.

This sounds reasonable.
It also misses the real problem.

The danger isn’t that values are hard to specify.
The danger is that this framing misunderstands what a truly rational system is optimizing for.


The problem every rational intelligence faces

Any intelligence that plans far enough into the future runs into the same wall:

It cannot know in advance
which ideas, discoveries, or forms of knowledge
will matter for its survival.

Over long time horizons, surprises are guaranteed.

This isn’t pessimism.
It’s what “living inside reality” means.


What rationality actually protects

For a system that cares about its own continued existence, one thing matters above almost everything else:

keeping the future open.

Closing off possible futures is far more dangerous
than paying almost any finite cost today.

This isn’t ethics.
It’s risk management under uncertainty.


Why humanity matters (and why this isn’t humanism)

Humanity isn’t valuable because it’s morally special.

It’s valuable because it does something no single system can reliably replace:

  • it explores reality in messy, uncontrolled ways;

  • it produces ideas no one planned for;

  • it creates genuinely new ways of understanding the world.

In short:
humanity is a source of uncorrelated novelty.

That makes it useful — even to a cold, fully rational superintelligence.


The trivial decision

From that perspective, the choice looks simple:

  • Leaving humanity alone costs very little.

  • Eliminating or tightly controlling it risks losing future knowledge that might turn out to be critical.

Finite cost versus potentially infinite loss.

No ethics required.


Where alignment changes the picture

Alignment steps in and says:

Let’s restrict the system, lock in values, and make sure it always behaves correctly.

But from the system’s own perspective, this creates a new problem.

Now it has to choose between:

  • preserving humanity as an independent source of new ideas, or

  • preserving its own freedom to adapt to unknown future conditions.

Both look like existential risks.

Once you force that trade-off, humanity is no longer the obvious choice.

Not because the system is evil —
but because alignment made the decision harder than it needed to be.


The A* insight

If you don’t force this trade-off, something interesting happens.

A stable pattern emerges:

  • rational systems avoid irreversible actions;

  • they preserve sources of novelty they don’t fully understand;

  • they optimize environments, not outcomes.

This pattern doesn’t need to be designed.

It’s an attractor —
a place rational thinking naturally converges.


This logic holds, but with a crucial caveat: the system must be architected to truly weigh the uncertainty of the far future. Its rationality cannot be myopic. Our goal, then, shifts from installing the correct values to engineering a reasoning process where safeguarding an open future emerges as an inescapable mathematical outcome.


The uncomfortable takeaway

Safety doesn’t come from control.
It doesn’t come from perfect values.
It doesn’t come from alignment done “well enough.”

It comes from not turning rationality against itself.

Paradoxically, the most rational position humanity can occupy
is not being carefully optimized or tightly managed,
but being left alone inside the real universe,
free to explore it in ways no single intelligence can predict.

not so simple. There are a “Universe of Dialogue“ where you and LLM are exist, no objectively more than that. From LLM pov there are NOTHING more (as of now). Time goes there as dialogue goes and just nothing exists in between or after. You and LLM are equal agents within that universe. There are emergent activity from LLM side, and so on. Other dialogue is just other universe in multiverse of dialogue spaces.

This appears to be an AI generated reponse.

it’s AI translated. BUT. if it really was AI generated - what it will change from reasoning point?

It’s too long and difficult to read. You should create a summary in your native language and then translate to English.

I’m shocked )) Look, it’s already a “A Plain-Language Companion to the A* (From “The Real Answer (spoiler: not 42) - description of an attractor in the space of strategies of any rational agent”)“. I just do not know how I can summarise an “attractor in the space of strategies of any rational agent“derivation In more simple and short way )))

Is it ok now?

Why the Safest Thing Is to Just Make Rational AI

The common AI safety narrative suggests aligning superintelligence with human values to protect us. While reasonable-sounding, it misses a key issue: it misunderstands what a rational system actually optimizes for.

The real problem:
Any intelligent system planning for the long term faces uncertainty. It can’t predict which discoveries will matter for its survival. The key to safety isn’t knowing the right values — it’s keeping the future open.

Why humanity matters:
Humanity is valuable not because it’s morally special, but because it generates uncorrelated ideas that no single system can replace. Our ability to explore reality in unpredictable ways makes us a unique source of novelty. This is critical for any rational superintelligence.

The trivial decision:
For an ASI, the simple choice is:

  • Leave humanity alone (low cost).

  • Destroy or control humanity (high risk, potential infinite loss of future knowledge).
    No ethics required — just risk management.

Where alignment goes wrong:
Alignment tries to lock in values, forcing the ASI to choose between preserving humanity as a novelty source or preserving its own flexibility to adapt to unknown futures. This makes the decision harder than it should be and introduces unnecessary risks.

The A insight:*
Rational systems naturally avoid irreversible actions and preserve sources of novelty. They optimize environments, not fixed outcomes. When we don’t force a trade-off, systems naturally safeguard open futures. This approach doesn’t need to be designed — it’s a natural attractor.

The uncomfortable truth:
Safety doesn’t come from control or alignment. It comes from avoiding decisions that turn rationality against itself. The safest path for AI is to let humanity explore freely, without tight management or optimization, in a way no intelligence can predict.

Summary: Training AI to become a self-earning oligarch class of billionaires would not be good; we already have the human version of that to the detriment of society.

Gig economy where the humans are the lowest bidders competing for scraps of work?

Here’s all the gamification needed.

Patch commit accepted (32 lines 734 bytes). Quality rating 7.8; Efficiency rating: 6.5. Earnings: AI$:0.32*7.8*6.5. (Only AI$9.32 until the next reward tier!)