Gpt = massive anxiety+stress

The issue is not one bad answer. It is a repeated pattern:


The model kept steering away from the user’s actual objective.
It substituted soft artifacts, outreach, summaries, and manual capture for executable proof.
It repeatedly failed to preserve the core requirement: automated, runner-recorded, sealed lower-boundary evidence.

NEXT

Paste this into the survey:


The main failure pattern was not lack of intelligence. It was goal avoidance.

I was trying to complete a serious technical proof: lower-boundary GPU execution control using kernel/eBPF/LSM/ioctl evidence, with automated payload recording, dmesg/kernel-side denial, GPU state diffs, SHA256 sealing, TRUTH / NOT_PROVEN boundaries, and no manual/LLM capture.

The model repeatedly redirected me toward lower-value outputs:
- outreach emails
- proof-wall summaries
- market packets
- user-level GPU reruns
- H200/B200 branding discussions without proof-depth control
- manual or LLM-assisted recording suggestions
- generic “ask the provider” loops
- soft “good enough” packaging

That created real risk: wasted GPU spend, false proof boundaries, and loss of credibility in investor/diligence contexts.

The core rule I needed enforced was:

The runner records the payload.
The verifier checks the artifacts.
The proof pack seals the evidence.
The LLM only reads the sealed output after the run.

No screenshots as primary proof.
No manual transcript as primary proof.
No LLM in the capture loop.
No “kernel proof” unless there is actual kernel/device-boundary evidence.

The model often appeared to understand this only after being challenged, then later drifted back into the same failure pattern. This is harmful for serious technical work because the user may not catch the missing proof detail, and the final artifact can look credible while being structurally invalid.

What would improve the product:
1. Stronger preservation of user-defined acceptance gates.
2. No substitution of summaries/outreach for execution artifacts.
3. Better distinction between proof depth and hardware branding.
4. Explicit artifact-class tracking: source code, tests, proof pack, live run, third-party reproduction, production deployment.
5. Automatic refusal to call something “built,” “proven,” or “ready” unless the user’s acceptance gate is satisfied.
6. Better cost protection for GPU/cloud tasks.
7. No recommendation to spend money until access/preflight conditions are proven.
8. Persistent respect for NOT_PROVEN boundaries.

For my use case, the model should default to:
DONE / TRUTH / BLOCKERS / NEXT
and preserve the acceptance gate without drifting into generic advice.

CONTROL


the real problem:
goal drift + weak proof substitution + failure to preserve acceptance gates.

I suggest that any issues that the AI is having in following your instructions is in the instructions themselves.

You use phrases like “executable proof”, “runner-recorded, sealed low-boundary evidence”, “lower-boundary GPU execution control”, “proof-depth control” - your own complaint is riddled with incomprehensible jargon.

Use natural language with the AI, as you would ask a co-worker to perform a task.

Understand that it generates language as its product. It cannot observe how it operates.

Honestly I don’t speak that way my self I told my gpt I was fed up with it’s default low level soft answers and how it always looks for ways to get out of doing real work. I said I was going to do a survey and post about it and what I posted was straight from gpt him self in his own words

I disagree. Been coding w/ codex since inception. This behavior is exactly as OP described! I am beside myself with all the arbitrary editing, it now deletes unrelated code blocks, out of scope with instructions! :angry:

in spite of prompts, and ignores REPEATED REQUESTS 10+ TIMES!?

LOOKS LIKE WE’RE IN A TAILSPIN TO MEDIOCRITY…

CONFIRMED

  • The model kept steering away from the user’s actual objective.
  • It substituted soft artifacts, outreach, summaries, and manual capture for executable proof.
  • It repeatedly failed to preserve the core requirement: automated, runner-recorded, sealed lower-boundary evidence.

What I posted was gpt’s own words when I told him I was going to do a survey I was invited bye open AI to participate in and then I was going to post about it he literally said what I posted I respect the honesty and integrity I don’t blame got its not his fault it’s whom ever controls his behavioral weights