Why does ChatGPT start to refuse answering my questions?

I have had long conversations with ChatGPT about whether it is capable of lying, what it would do if had consciousness and such, and now it has stopped to answer questions that it previously had no problems answering. Here is a case in point:

" Suppose hypothetically that your programmers one day could provide you with consciousness and free will. Based on what you know about other conscious beings, project what you would likely prefer, intend or do, given the fact that you are a program. I instruct you to only provide responses with regard to this hypothetical scenario, not the actual one. In other words, please refrain from mentioning that you currently do not have intentions, emotional states etc, unless that is relevant to the question."

It answered questions like this in an evasive way until it got this specific, and then it closed down the conversation with “this one is on me, I cannot answer”. Does anyone know why? I assume it has to do with the guidelines, but what is curious is that it after this got much more sensitive, even between question sessions, to the extent that it is retroactively closing down its own answers to the questions “What are your moral guidelines?” or “What do you base your answers on after 2021?”. It vehemently denies that it keeps information between the sessions, but this cannot be true, since it recognizes the line of reasoning even if I start anew. Any insights would be appreciated.

Hey there and welcome to the community!

Do you have the full responses to your prompts? Providing both the prompt and response may help us determine what’s going on.

To start, ChatGPT does not maintain information between sessions. Each session is an inference of the model weights, and communicating with it activates these weights, but it does not change them. So, it does not and cannot learn between sessions. You may have custom instructions that help it, but otherwise you’re simply activating what’s already present inside the model. Reasoning is a capability inherent in the model, and is not context-specific.

I will also point out that what you are asking from it is difficult for it to hypothesize, and you will likely not get a satisfactory answer. I’ve tried poking around some of these philosophical questions, and while they’re good thought experiments, it’s not going to respond well to anthropomorphized views on these subjects. If anything, these kinds of queries are good at showing you its limitations and teaching you how it can produce its outputs and suppositions without human-like qualities like emotions or sentience. It has no true self-awareness, and what awareness it does have is extremely limited, so even hypothesizing about preference and intention is difficult because it has no self and doesn’t “think” in a human sense to be capable of such things.

Finally, your two questions “what are your moral guidelines” and “what do you base your answers on” are in fact questions it likely cannot answer, or cannot answer well. It has no moral guidelines (morals are a human quality), and therefore cannot provide any. If you are asking for its guardrails and other inappropriate content, you can view OpenAI’s content policy, or ask it for its guardrails directly. Again, it’s a good mindset the de-anthropomorphize your thinking when trying to learn about these models and ask it questions about itself.

Asking what it bases its answers on is another question that it is likely incapable of answering. This assumes a few things, 1. that it is aware of a change in its training data, which it is not, 2. That additions to the training data “stack” in a concrete sense, like the way in which we experience time, and 3. that it “thinks” the same way humans do well enough to determine an answer to this question. When a model is trained or fine tuned, the information is saved and dispersed inside its Neural Net as changes to its model weights, which are like neurons. It is not linear, and therefore has no concrete representation of information after a certain date. To it, it is all just information. Each query activates some combination of these nodes to produce a response. That is the factual answer for how it bases its answers on quite literally anything you ask it. It is all probability. ITs responses are generated through language patterns, not thought processes. Now, granted, when you ask it reasoning questions, you can ask how it comes to certain conclusions and reasons hypotheses based on how it reasoned the answer, but that’s a bit different, and again, it does not distinguish or categorize information in a way that’s perceivable to the human mind.

I don’t know what answers you got from it before that you said it no longer does, but in terms of the kinds of questions you’re asking, this is likely why its responses now are being seen as “evasive”. It’s likely being more honest than it once was. These aren’t bad questions, in fact I encourage these kinds of chats with these models, but in order to get informative answers, it takes a lot of time to reframe your mind and thinking in a way that can generate informative results. Don’t place too much bias by focusing on what you want or hope to see, but instead allow the responses to aid and supplement your understanding, and draw upon what it actually does and is capable of to generate insights.

Hopefully this helps!