Evidence of Severe Context Regression and Ignored Custom Instructions in GPT-4.1 (October 2025)
Date: 2025-10-22
Model: GPT-4.1 (OpenAI Plus, Project Context and Custom Instructions active)
Context
- Long-term technical project (DevOps, server administration, real-world production context)
- Custom Instruction: “Always use the current official documentation as the only source for technical answers. Never rely on forums or outdated distribution defaults.”
- Use-case example: nginx configuration for HTTP/2 (Debian 13, nginx 1.25.x+)
- User provided screenshot and direct link to the official nginx documentation, which explicitly shows
http2 on;as a valid server block directive.
Expected Model Behavior (historical GPT-4.1 standard)
- After a single user correction with an official source:
- Immediate correction of all following answers
- Adoption of the newly proven method as the new default
- No repetition of the previously incorrect answer
Actual Model Behavior (October 2025)
- GPT-4.1 repeatedly insisted that
http2 on;does not exist, claiminglisten ... http2is the only correct method, even after being shown the current official documentation. - Ignored the user’s explicit reference to the up-to-date nginx docs and the screenshot evidence.
- Mixed outdated defaults, forum lore, and distribution-specific details into its answers, despite the Custom Instruction to only use the current official documentation.
- Only updated its responses after multiple rounds of explicit user pushback and proof.
- Previously, GPT-4.1 would have updated its responses after the first correction, and remained consistent for the rest of the conversation.
Summary / Core Problem
- Clear regression: The model repeated outdated/incorrect syntax despite direct user evidence and Custom Instructions.
- Custom Instructions (“always use the latest documentation”) are no longer reliably respected.
- GPT-4.1 used to be much more responsive to user correction and documentation — this no longer appears to be the case.
- The behavior is not deterministic or reproducible: identical prompts/sessions can produce different (often worse) results than before.
Additional observation
- Since around 2025-10-19, GPT-4.1 has shown significantly more context loss and “memory problems” — very similar to the problems GPT-5 had at launch.
- It now feels like GPT-4.1 is becoming more and more like GPT-5 in the worst ways, not the best.
- Critical context loss, failure to recall rules, and repeated ignoring of previous corrections is now the norm.
Questions for OpenAI / the community:
- Why is GPT-4.1 now repeatedly giving outdated or incorrect answers even after explicit, evidence-based user correction?
- Are there known changes to how Custom Instructions and Project Context are prioritized in GPT-4.1 since mid-October 2025?
- Is there any way for power users to ensure consistency, context retention, and reliable instruction-following in GPT-4.1 going forward?
Full chat transcript and screenshots available on request. Feel free to reference this case for similar regression reports.