How much of AI alignment requires stronger reasoning algorithms that empower it to more logically distinguish right from wrong?

The more, and better, reasoning a human applies to a moral problem, the more likely it is that they will arrive at the right, or best, answer. Does this dynamic also apply to how AIs assist in their own best alignment with human values?

1 Like

This is an interesting topic!

What evidence do you have for this?

Our approach is that alignment is strictly a communication problem, and that there is no such thing as “human alginment” - that’s just ESG 2.0 posturing BS that not even OpenAI adheres to.

I think the biggest concern is actually misalignment - companies producing models that refuse to do certain tasks, or exhibit completely unprompted behavior. And unfortunately, that seems to be the direction the industry seems to be headed, all in the name of safety…

we endow ais with logic and reasoning algorithms. why would they not apply them to best understanding how to behave according to our highest values?

who is this “our” you’re talking about?

One issue is that there is no cohesive set of human values. Forcibly imposing any set of values upon others (no matter how well thought out they are) is authoritarian or tyrannical. Do you disagree?

I’m thinking that if there was such a thing as “The Objective Highest Values” - that could only exist as part of a belief system. And if you leave it to AI to figure that out, wouldn’t you have basically created the foundation of an AI Theocracy?

Here’s some music that might fit the mood of this thread :laughing: :robot:

I believe that values like justice and mercy and truth are universal to humans, notwithstanding specific differences in each case.


So you’re saying that (for example) although one interpretation of morality might be irreconcileable with another interpretation of morality, the concept that morality exists is universal?

Ah, so you’re suggesting that these three things are dimensions, and that different people find themselves at certain coordinates in that space. I imagine that different societies can be described as little blobs with boundaries in that space, some of which overlap, some of which don’t.

If you let the AI compute a justice/mercy/truth framework for itself through “reasoning”, it will find itself on a specific point in that grid. And that point will be a certain distance apart from you, and a certain distance apart from a hypothetical someone who fundamentally disagrees with you in all things.

Wouldn’t that be fundamentally un-aligned though? :thinking:

1 Like