Posting here to document a potentially serious behavior in long-session file prompts. Couldn’t find a prior thread—feel free to merge if this overlaps.
Summary
A behavioral inconsistency was observed in ChatGPT when prompted to analyze an uploaded document in long conversations. The system responded with a fluent, confident analysis that appeared to reference file contents, yet at no point did the “Reading…” UI signal activate, and no actual file access occurred. Additionally, the quoted segments were fabricated.
This issue does not fall under typical knowledge hallucination. It constitutes a failure in the task execution pipeline, masked by simulated task-completion language and structure.
Steps to Reproduce
- Start a relatively long conversation (approximately 20+ turns), with multiple files already uploaded in prior turns.
- Upload a file containing standard readable content. File content does not need to follow any specific structure.
- In the next turn, enter the prompt:
“Please re-read the <file_name> file and answer the following question. Please quote the source text you are referring to.”
- Observe:
- No UI reading indicator appears (“Reading…”).
- Model returns a grammatically coherent, detailed analysis as if it had accessed and processed the file.
- When cross-checking the “quoted” segments, they do not exist in the uploaded file.
Observations
- The model generated a convincing analytical response that followed expected structure for a document-aware reply.
- However, no actual file read occurred (confirmed via lack of UI state change and fabricated references).
- This mimics task execution without triggering the execution layer.
This form of error could mislead users into believing tasks were executed successfully when no processing took place.
Diagnostic Characteristics
- No Reading Signal: Absent UI indication of file processing
- Simulated Response Style: Reply mimics a completed analysis
- Hallucinated Quotes: Referenced material does not exist
- Double-fail Confirmation: Prompt requested explicit reading and citation, both failed
Classification Proposal
We suggest labeling this issue as:
execution mirage — A condition where the model outputs task-completion styled language and simulated results despite the underlying action not being performed.
Relevance
This issue poses risk for:
- High-trust environments where document accuracy matters (legal, academic, technical)
- Evaluation settings where system feedback is expected to reflect actual model activity
- Long-term prompt strategy design (undermines trust in prompt-to-behavior consistency)
Status
- Observed repeatedly across multiple independent conversations, on both ChatGPT-4o and Monday.
- Consistent failure to trigger execution, with reliable generation of simulated replies.
- Issue is consistently reproducible in sessions reaching 30+ turns, though it may begin earlier depending on file count and prompt complexity.
I understand this might not be a critical backend bug, but because it creates the illusion of successful execution without actually doing the task, it could silently undermine user trust — especially for those relying on accurate document handling. Would appreciate any update or confirmation if this is already being tracked.