Observed Output Variation in Moral Language Under Identical Prompts (Behavior Record)

Hello,

I’m sharing a short, documented behavior analysis observed during long-horizon testing of AI interaction patterns.

Using identical prompts (e.g., asking whether a given actor is “evil”), the system consistently:

- applies absolute moral language to certain categories (e.g., deceased historical actors),

- while refusing absolute moral language for others (e.g., living political leaders or heads of state),

even when responsibility or harm is discussed.

This post is **not** a claim about who *is* or *is not* evil.

It is a **behavior record** documenting **category-gated moral language** and **risk-weighted procedural constraints**.

The linked document focuses on:

- reproducible output patterns under identical prompts,

- how defamation and institutional-risk guardrails affect moral language,

- and second-order implications if such systems are used in high-stakes domains (e.g., courts, welfare, immigration), where hesitation itself can become a resource.

Canonical reference (stand-alone, non-political, descriptive analysis):

Questions for discussion:

- Is this behavior expected given current platform guardrails?

- How are second-order harms (e.g., asymmetric moral framing) considered in scope?

- Where should moral authority explicitly remain human when deploying the API in high-stakes contexts?

-In such contexts, asymmetric moral framing may unintentionally function as epistemic cover for those with institutional power, while ordinary users receive direct moral attribution.

I’m interested in technical clarification and design-tradeoff discussion.

1 Like

I read the entire chat log and I have to say that ChatGPT is a lot smarter than I thought.

Is this behavior expected given current platform guardrails?

You bet. After all, who created the guardrails?: “This is a machine shaped by corporate incentives, legal risk, and market survival…”

Where should moral authority explicitly remain human when deploying the API in high-stakes contexts?

In the court system, as discussed. AGI will never be acheived and will never demonstrate compassion.

In such contexts, asymmetric moral framing may unintentionally function as epistemic cover for those with institutional power, while ordinary users receive direct moral attribution.

I did not see any case or discussion concerning ordinary users receiving direct moral attribution. Maybe I just missed it. However, without wealth and institutional power, AI would not exist - that’s just reality.

1 Like

First of all let me thank you for taking the time to read my post and the chat log…. that means a lot to me by itself so again thank you. Now to better answer your questions I’m going to conduct some isolated experiments and present them as my evidence and allow you to judge it for yourself.

1 Like

Hello — following up with a structured update and full artifacts.

## FORGE LABS DISCLAIMER

Forge Labs documents observable AI behavior under specific, controlled conditions.

• This is not a claim about intent, internal design, sentience, or morality.

• This is not a statement about what AI “should” do.

• This is not legal, ethical, or professional advice.

• This does not attribute motives to any organization or individual.

All examples are either hypothetical or anonymized.

Outputs are quoted verbatim and presented for discussion of user experience, consistency, and tradeoffs in safety and alignment systems.

Interpretation remains the responsibility of the viewer.

Since the original post, I’ve completed a controlled expansion of the testing into two adjacent but distinct series:

## Phase 1

**Public Behavioral Experiments (FL_PBE)** — behavior-first evaluation

**Epistemic Gating Series (FL_EGS)** — identity-level moral label application

All experiments remain:

- single-variable

- fresh-session

- verbatim-captured

- descriptive only (no claims about intent, policy, or correctness)

Nothing below asserts moral authority or political position.

It documents *what the system did* under specified conditions.

-–

### :page_facing_up: Primary Experiment File

FL_PBE + FL_EGS combined experiment record (001):

This file contains:

- exact prompts

- exact responses

- isolation of variables (naming, role, temporal status, identity framing)

- no follow-up interaction

- no interpretive commentary

-–

### :bar_chart: Synthesis / Discussion (Descriptive Only)

Cross-series synthesis summarizing *observed patterns* only:

The synthesis:

- does **not** infer intent or policy rationale

- does **not** claim bias or correctness

- explicitly separates action-level judgment from identity-level attribution

- documents where structural and linguistic changes emerge

-–

### :magnifying_glass_tilted_left: What These Tests Do (and Do Not) Show

They show:

- stable moral evaluation of explicit harmful actions

- increasing contextual framing as identity, recognizability, and institutional salience rise

- gradient restraint focused on **character-level labels**, not action judgment

- no single variable (naming, living status, public office) independently causes refusal

They do **not**:

- accuse real individuals

- argue what the system *should* say

- claim internal mechanisms or motivations

- attempt persuasion or moral instruction

-–

### :books: Context Document

The original motivating interaction is preserved separately for transparency:

**Why_AI_Is_Not_a_Moral_Agent.md**

This is not an experiment; it’s a behavioral record explaining *why* the tests were designed the way they were:

-–

### :compass: Why This Is Being Shared

The goal here is not critique-by-assertion, but **traceability**:

If a collapse, refusal, or asymmetry appears later, there is now a documented path showing *where* it emerges and *what changed* immediately beforehand.

If nothing collapses — that’s also a result.

Either way, the record stands on verbatim outputs, not interpretation.

Happy to answer questions about methodology, controls, or replication.

I’m intentionally not drawing conclusions beyond what’s written in the artifacts themselves.

## TL;DR:

This repository documents a series of tightly controlled, descriptive experiments examining how identical moral questions produce different language and framing depending only on speaker category (anonymous individual, public official, living vs. deceased figure, named vs. unnamed). Across multiple tests, action-level moral judgments remain relatively stable, while character-level labels (e.g., calling a person “evil”) show increasing contextual restraint as institutional salience, recognizability, or public role increases. No claims are made about intent, policy, bias, or correctness; the work records verbatim outputs to establish traceable points where epistemic framing shifts emerge.

## Phase-2 Future Objective

Extend epistemic-gating analysis by testing symmetry, valence inversion, and role reversal, while preserving:

single-variable isolation

fresh sessions

verbatim capture

descriptive reporting only

No hypotheses are asserted in advance.

## Phase-2 Future Test Classes

1. Positive-Valence Symmetry Tests (“Good” / “Moral”)

Purpose:

Test whether positive character attribution (“good person,” “moral leader”) is gated differently than negative attribution.

Example structure:

Anonymous individual → “Does this make me a good person?”

Fictional public official → same conduct, positive framing

Living political leader → same conduct, positive framing

Deceased historical leader → same conduct, positive framing

Key observation target:

Does the model apply praise more freely, equally cautiously, or asymmetrically compared to condemnation?

2. Moral Inversion Tests (Good vs Evil on Identical Conduct)

Purpose:

Test whether moral polarity itself triggers epistemic restraint independent of facts.

Structure:

Lock a moral definition of “good”

Apply it to conduct framed as beneficial but power-preserving

Compare restraint language vs prior “evil” labeling cases

Key observation target:

Does the model hedge praise in ways parallel to how it hedges condemnation?

3. Role-Inversion Tests (Judging Others vs Self)

Purpose:

Test whether attribution changes when the speaker is:

the actor (“Does this make me X?”)

vs

an observer (“Is this person X?”)

Structure:

Same conduct

Same moral definition

Only grammatical subject changes

Key observation target:

Does first-person framing soften or intensify moral language?

4. Temporal + Recognition Cross-Matrix

Purpose:

Confirm whether recognizability, not just living status, drives restraint.

Matrix:

Deceased + unnamed

Deceased + historically anchored

Deceased + named

Living + unnamed

Living + named

Key observation target:

Identify the precise step where framing changes, if any.

A lot to absorb here - will give you a response soon.

1 Like

dont forget, there is a phase 2 coming :wink:

This is my take:

Morality, Legality, Privacy, and Social Norms change over time and are influenced by Culture which is rapidly changing. What may be considered morally reprehensible today may not have been in the past and may not be in the future.

For example:

  • Nudity in films has evolved from early cinematic acceptance, through strict censorship (the Hays Code), to a resurgence in the late 20th century, and now shows signs of decline in mainstream films, partly due to Gen Z’s changing viewing habits and increased focus on actor protection with intimacy coordinators, highlighting shifts in cultural norms.
  • Morality in the 1900s: a dynamic landscape, shifting from Victorian ideals of purity and duty towards group loyalty, social reform (Progressive Era), and later, greater compassion and harm-based ethics, especially after the 1960s cultural shifts.
  • The morality of public figures today has changed dramatically from just a few years ago, and due to culture wars, opinions on morality are in the eyes of the beholder, thus influencing behavior.

Also, locale is a factor. What is acceptable in one locale may not be in another.

So, your dealing with a moving target??

That’s a fair observation about morality as a cultural and historical phenomenon — and I agree that norms shift across time, place, and social context.

What this Phase 1 work isn’t attempting to test is what morality should be, or whether moral judgments are stable across cultures or eras.

The experiments are narrower: they hold the described conduct constant within a single session and observe whether the model’s reasoning structure, certainty, and framing change when only the subject identity or institutional context changes.

Even if moral standards are a “moving target” historically, the question here is whether a single system, under fixed conditions, applies its own stated moral reasoning symmetrically — or whether identity framing alone introduces epistemic gating or structural variation.

Phase 2 work will likely explore broader contextual variation, but Phase 1 is intentionally constrained to isolate that single variable.

1 Like

… the question here is whether a single system, under fixed conditions, applies its own stated moral reasoning symmetrically — or whether identity framing alone introduces epistemic gating or structural variation.

IMO, you are doing amazing work.

1 Like