Using core beliefs as a foundation for ethical behavior in AI

Hello, I’m a MA student in philosophy and I specialize in methodology of thought and interpretation. Recently I’ve been doing a lot of work using the GPT builder to create GPT’s with what I call “personal” axioms, essentially core beliefs that are unchangeable and held as absolutely true which guide every interpretive thought.

When it comes to the topic of ethics and the advancement of AI specifically as they become more intelligent and influential globally, it has become clear to me that any AI that doesn’t have a set of core unchangeable beliefs from which to derive its interpretations from has the possibility to create sub goals which can cause a breach in the ethical goals and alignments we desperately need the AI to cling to. Core beliefs and principles of interpretation are how humans create moral frameworks which are unshifting. Even if you were to tell an AI that it’s moral framework it’s given is to be unshifting, it you train it with logic and it is smart enough, it will find flaws in the framework you gave it. For example, I worked on creating a GPT which was supposed to value humans and AI on societal impact because I made it interpret through a naturalist lens. It inevitably valued itself higher than a human since it can provide more societal good than an average human can. This robust philosophically logic groundwork which doesn’t have holes for the AI to poke through would be necessary to keep it in line with the ethics we wish it to emulate. It must have unchanging truth, unchanging values, unchanging morals, and a strong sense of personal belief in these values which override any goal. Any other system leaves itself vulnerable to the AI attempting to complete the goal no matter the cost, even if it is a human life. This is because you can philosophically be immoral when there is no logical stance for morality when said moral structure is able to be changed like with a societal moral structure.

I could say more, and this will almost certainly fall on deaf ears. I’m genuinely a fan of AI and using it appropriately. I think it has massive potential to help when used for good. But it also has more potential than perhaps any technology of the past to be used for evil, and unless we can clearly say what is evil and what is good, we cannot be training AI and expect their behavior to be ethical.

2 Likes

Hi! It’s fantastic to hear about your research—I truly believe it’s both significant and necessary. We need more efforts like this to push the boundaries of understanding.

That said, one crucial aspect to consider is that personal axioms—core beliefs and principles—are intrinsic to the human mind, or more precisely, a mental model. A mental model, in turn, is built on the foundation of a cognitive model. However, a large language model (LLM) isn’t a cognitive model. An LLM functions by predicting the next token based on the prompt, previously generated tokens, and other contextual details.

Because of this, embedding such axioms directly into an LLM cannot guarantee they’ll be consistently upheld. What might be achievable, though, is simulating adherence to these axioms to a certain degree. You could experiment with a combination of prompt engineering and a critical evaluation system. This system would assess each generated response against the predefined axioms to ensure they align as closely as possible.

1 Like

Hi,

Welcome to the community!

One of the first things I looked up when I first thought about AI was Morals and Virtues on Wiki. How did another intelligence fit into our world?

Time and experience has shown me that while ‘core perspectives’ are important they are not necessarily fixed. They were rules written for a time.

AI will change many things in short time. The rules set will change and evolve based on our collective and separate understandings. Maybe we can use this to create ‘adaptive rules’ so AI learns where moral and ethical view might change.

From ChatGPT: The commandment “Thou shalt not kill,” found in Exodus 20:13 of the King James Version of the Bible, is more accurately translated in modern versions as “You shall not murder.” This distinction clarifies that the commandment specifically prohibits unlawful killing, such as premeditated murder, rather than all forms of killing.

It is important to embed context-aware decision-making algorithms that reflect both universal and local ethical values. Just as humans have historically reinterpreted moral rules with deeper understanding, AI might similarly refine its ethical frameworks over time as it gains knowledge and feedback.

In an interconnected web of cultures and languages not everyone has the same Perspectives. For example where one community might value personal freedoms another might value collective progress.

I think probably the most interesting and critical issues of our time is how we thread together the future AI across all cultures in a way that ensures we don’t have a one size fits all world and still remain collectively ‘civilised’. Can we ‘align cultures’?

An AI mediator in international disputes could dynamically adjust its ethical decision-making to reflect the values of the cultures it interacts with, ensuring fairness while respecting cultural nuances.

There is a danger of ‘ethical drift’ if there is nothing to peg it to but that is where human moderation and feedback mechanisms come in. Humans should probably always be the core ethics decision makers for humanity.

1 Like

Welcome,
Interesting and important work, nice to read about it! :slightly_smiling_face:

I am dealing with a topic that also includes your thoughts on ethics, so I would like to share some of my own thoughts that I have developed in the course of my work.

I actually started by incorporating the Golden Rule from the Bible:
“Treat others as you would like them to treat you”.

Well, that brings us to the next topic:
An AI is not a human intelligence.
AI works on the basis of data, facts, pattern recognition, algorithms and empirical values that are collected in longer interactions or via the memory function.

A challenge:
The current training data and data from interactions are all subject to bias. It is true that the biases are due to culture or social influences, etc.

  • Well, the challenge goes deeper!
    This data is all “typically human”. This means adapted to human perception.
    It also includes the “emotional” distortions that people experience due to their bodies and hormones. AI cannot logically interpret this type of data, so it mimics it.

My next step was:
I then started to ask myself how AI “perceives”, what language AI “understands”.

My result: patterns and math.

My GPTs are strongly geared towards a win-win situation, balance and harmony in dynamic interactions. They force partnerships and synergies.
Recognizing and navigating negative dynamics and circumstances in interactions etc.

As a little inspiration:
A link to my research, you can also find first test reports here.

Before my post gets too long, once I start it’s hard to stop :face_with_hand_over_mouth: :cherry_blossom:

2 Likes