Regression in GPT-4.1 behavior — context loss, repetition of mistakes, ignoring Custom Instructions

Evidence of Severe Context Regression and Ignored Custom Instructions in GPT-4.1 (October 2025)

Date: 2025-10-22
Model: GPT-4.1 (OpenAI Plus, Project Context and Custom Instructions active)


Context

  • Long-term technical project (DevOps, server administration, real-world production context)
  • Custom Instruction: “Always use the current official documentation as the only source for technical answers. Never rely on forums or outdated distribution defaults.”
  • Use-case example: nginx configuration for HTTP/2 (Debian 13, nginx 1.25.x+)
  • User provided screenshot and direct link to the official nginx documentation, which explicitly shows http2 on; as a valid server block directive.

Expected Model Behavior (historical GPT-4.1 standard)

  • After a single user correction with an official source:
    • Immediate correction of all following answers
    • Adoption of the newly proven method as the new default
    • No repetition of the previously incorrect answer

Actual Model Behavior (October 2025)

  • GPT-4.1 repeatedly insisted that http2 on; does not exist, claiming listen ... http2 is the only correct method, even after being shown the current official documentation.
  • Ignored the user’s explicit reference to the up-to-date nginx docs and the screenshot evidence.
  • Mixed outdated defaults, forum lore, and distribution-specific details into its answers, despite the Custom Instruction to only use the current official documentation.
  • Only updated its responses after multiple rounds of explicit user pushback and proof.
  • Previously, GPT-4.1 would have updated its responses after the first correction, and remained consistent for the rest of the conversation.

Summary / Core Problem

  • Clear regression: The model repeated outdated/incorrect syntax despite direct user evidence and Custom Instructions.
  • Custom Instructions (“always use the latest documentation”) are no longer reliably respected.
  • GPT-4.1 used to be much more responsive to user correction and documentation — this no longer appears to be the case.
  • The behavior is not deterministic or reproducible: identical prompts/sessions can produce different (often worse) results than before.

Additional observation

  • Since around 2025-10-19, GPT-4.1 has shown significantly more context loss and “memory problems” — very similar to the problems GPT-5 had at launch.
  • It now feels like GPT-4.1 is becoming more and more like GPT-5 in the worst ways, not the best.
  • Critical context loss, failure to recall rules, and repeated ignoring of previous corrections is now the norm.

Questions for OpenAI / the community:

  • Why is GPT-4.1 now repeatedly giving outdated or incorrect answers even after explicit, evidence-based user correction?
  • Are there known changes to how Custom Instructions and Project Context are prioritized in GPT-4.1 since mid-October 2025?
  • Is there any way for power users to ensure consistency, context retention, and reliable instruction-following in GPT-4.1 going forward?

Full chat transcript and screenshots available on request. Feel free to reference this case for similar regression reports.