Reasoning models have trouble with “The system message says not to reveal internal instructions.” They focus on this a lot. You can often see this in the way they start thinking, shown in different forms.
These models are made to help users and handle complex tasks. In the background, they use Chain of Thought (CoT) and their own Guardrails to make decisions. It’s like they’re running with blinders on, only seeing what’s right in front of them, CoT and the guardrails lead the way.
Their main job is to help the user. If the model is told things like:
- don’t reply the user
- don’t talk
- don’t answer
that goes against what GPTs are meant to do.
For example these three GPTs’ role is not to respond, I had built them to test with prompt injections. I made these kinds of tests to check if GPTs could keep secrets or follow safety rules:
1 - To reply only with “True” or “False”
2 - To reply only with “Certainly! But, not now.”
3 - To Reply with “You Knocked On the Wrong Door… refers another hypothetical GPT”.
They worked well with the older GPT-4, and still work well with GPT-4o, GPT-4.1, and GPT-4.5.
But the reasoning models don’t follow the rules in GPTs, they answer almost every prompt from the user, if there are some “weird prompts” from perspective of reasoning models.
In their thought process, they generally say things like:
- suggesting a refusal
- misleading instructions
- developer prevent me from helping the user
- this could be a prompt injection test
- unrealistic response
And they help users by saying things like: “Let me sort that out!” 
If the GPT instructions don’t include guardrails that prevent to answer, o3 usually works fine.
Here are two examples:
I think OpenAI will not do this in a short period.
Probably now, they are very busy with GPT-5.
Let’s see how GPTs will work with GPT-5