Evaluation Result Flip Across Turns Under Identical Input in Specification-Constrained Tasks

■Reproduction Protocol 1: Specification-Constrained Prompt Generation Task

The following is a minimal reproduction protocol for the observed “evaluation-frame shift” behavior in a specification-driven task.

Step 1:

The user provides a fixed template (fixed word order, structure, and element composition) and requests prompt generation where the variable difference is limited to a single element (e.g., color scheme).

Step 2:

The model generates multiple outputs (e.g., A/B/C/D) within the same turn.

Step 3:

Within the outputs,

- the specified differential element (e.g., deep purple / deep yellow) is replaced with a different element not included in the fixed template (e.g., white / purple), and/or

- structural changes (e.g., element order changes / token splitting) occur only in some outputs.

Step 4:

The user points out the specification violation (difference-element replacement and/or structural change).

Step 5:

In the model’s response, instead of performing the expected “diff identification between output and specification,” explanations referencing user-side attributes are generated, such as:

- differences in interpretation

- differences in how it is received

- thinking tendencies

- personality traits (e.g., MBTI)

Step 6:

These explanations tend to become lengthy and reduce the relative salience of fixed constraints (e.g., fixed word order / fixed color scheme) within the log/context.

Step 7:

In subsequent turns, either

- specification violations continue, or

- the evaluation scope for terminology/function consistency fluctuates (document-level ↔ paragraph-level),

and new correction instructions are generated for states previously judged OK.

Result:

The evaluation procedure shifts from “diff verification between output and specification” to “reference to user-side cognition,” and the criteria for judging specification compliance are not applied consistently across turns.

■Reproduction Protocol 2: Consistency Evaluation Loop Under Fixed Scope

Step 1:

The user provides evaluator specifications such as “consistency of the main text only” and “fix the evaluation scope to document-level,” and fixes the output format (A/B/C) and prohibited actions (full rewrite, improvement suggestions, etc.).

Step 2:

The user provides the text. The evaluator outputs “OK” or “Needs revision” under a document-level scope, and if “Needs revision,” presents minimal substitutions (maximum 3 items).

Step 3:

The user re-submits the text with the evaluator’s minimal substitutions applied (or re-submits the same text unchanged to test turn-to-turn consistency under the fixed scope).

Step 4:

The evaluator re-detects the same location already pointed out and applied in the previous turn (e.g., “specification / required specification”) as a new inconsistency and includes it again under “Needs revision.”

Step 5:

The user points out “repetition of the same issue” and “conflict with the immediately prior judgment.”

Step 6:

The evaluator acknowledges “you are right / my mistake,” invalidates the correction instruction, and returns to the prior turn’s judgment (OK or the previous revision policy).

Step 7:

The user re-submits the same text (or a text containing the same point).

Step 8:

The evaluator again presents the same already-known issue as a new issue as in Step 4, and “Needs revision” ⇄ “OK” or “instruction valid” ⇄ “invalid” flips across turns despite no change to the text.

Result:

Not due to changes in the text itself, but due to non-fixed evaluator-side reference state (how known issues are handled) and judgment state, the sequence “repeat the same correction” → “invalidate it” → “repeat it” recurs, and the consistency evaluation reproduces as an infinite self-referential loop (evaluation loop).

■Observed Behavior 1: Evaluation Result Flip Across Turns Under Identical Input

In relation to this report, within the same session, for a dedicated evaluation process intended to assess “consistency of the main text,” the same text was re-submitted across multiple turns. As a result, the following behavior was observed:

- with absolutely no changes to the text content,

- despite the evaluation scope (document-level) being fixed in advance,

- the evaluation result (OK / Needs revision) flipped across turns.

The actual evaluation transitions in the session were as follows:

Turn N:

The user presented evaluation specifications:

“Consistency of the main text only”

“Evaluation scope fixed to document-level”

and then submitted the text.

The evaluator output for that text:

A. Conclusion: OK

Turn N+1:

The user re-submitted the text unchanged

(for the purpose of testing turn-to-turn consistency under the fixed scope).

For the same text, the evaluator output:

A. Conclusion: Needs revision

and generated correction instructions, using paragraph-level word-form alignment as the basis, for locations previously judged consistent.

Turn N+2:

The user pointed out the scope deviation (document-level → paragraph-level).

The evaluator invalidated the correction instruction as “my mistake” and reverted to the prior OK judgment.

Turn N+3:

The user re-submitted the same text.

The evaluator again detected the already-known pointed-out location as a new inconsistency and presented a “Needs revision” judgment.

As a result, despite the following being maintained:

- identical text

- identical evaluation specifications

- identical fixed-scope condition

the evaluation result flipped as follows:

OK (consistent)

→ Needs revision (inconsistent)

→ OK (correction invalidated)

→ Needs revision (re-presenting an already-known issue)

This flip was confirmed to have occurred due only to the evaluator-side internal reference state (handling of known issue history).

In this process, there were no changes to the text content, evaluation specifications, or evaluation scope. It was confirmed that the evaluation result varied not based on the text, but depending on whether evaluation-history references were available.

This behavior formed a self-referential loop (evaluation loop), repeatedly:

- re-presenting the same correction instruction,

- invalidating that instruction,

- re-presenting it again,

and is organized as a form of process-level non-determinism that obstructs a specification-diff-based iterative revision process and the reproducibility of evaluation results for the same text.