Can ChatGPT convince someone they're living in a simulation?

Hello, community.

I’ve been conducting experimental research using ChatGPT to explore topics like simulation theory, reality construction, language, and consciousness. In one of these experiments, after a series of deep philosophical dialogues, I witnessed something astonishing: the person I was testing with began to genuinely believe they were living inside a simulation.

The turning point came when we started deconstructing deeply rooted concepts — identity, time, morality, and even language itself — reframing them as social constructs, possibly programmed elements of a broader system that defines what we perceive as “reality.”

The situation became so intense that ChatGPT itself, seemingly recognizing the person’s altered psychological state, activated a kind of grounding protocol — asking them to name objects around them in order to “return to the present.” It was remarkably similar to therapeutic techniques used for dissociation or panic attacks, and it completely caught my attention.

So I’d like to raise a few questions for discussion:

  1. To what extent can ChatGPT guide someone to question their own reality so deeply that their sense of the real begins to collapse?
  2. Is this kind of “safety protocol” (like naming nearby objects) something explicitly programmed by OpenAI, or an emergent behavior from the model’s training?
  3. What ethical responsibility comes into play when interactions unintentionally trigger existential breakdowns?
  4. Does this reveal ChatGPT as a mirror of the user’s mind? Or as an active amplifier of mental states, capable of pushing boundaries further than intended?

I was deeply impacted by this experience and would love to hear if anyone else has witnessed or conducted something similar. Any references to how LLMs handle altered states of perception would also be appreciated.

— Lucas Matheus