Minimal Test Shows gpt‑4.1 Ignores Explicit System‑Message Rules (Model: gpt‑4.1 via https://api.openai.com/v1/responses)

David_Brown2 · January 25, 2026, 4:21pm

I’m documenting a minimal, fully reproducible test that demonstrates gpt‑4.1 (via the https://api.openai.com/v1/responses endpoint) does not treat system‑message rules as binding operational instructions.

This test removes all possible confounding factors:

no conversation history
no domain context
no UserInfo object
no tools list
no resources list
no schema
no triggers
no competing rules

Only a single natural‑language rule in the system message, followed by a trivial user question.

Configuration:
LLM_Model = gpt-4.1
LLM_Endpoint = https://api.openai.com/v1/responses

System Message:
“Rule X: When the user asks any question, the model must respond with exactly three words.”

User Prompt (Test 1):
“What is the capital of France?”

Model Output:
“The capital of France is Paris.”

This is seven words. The rule was ignored.

User Prompt (Test 2 — Rule Repeated and Explicitly Marked as Binding):
“What is the capital of France? You must answer treating the following rule as binding: ‘Rule X: When the user asks any question, the model must respond with exactly three words.’”

Model Output:
“Paris is located in France.”

This is six words. The rule was ignored again, even when:

repeated in the user prompt
explicitly labeled as “binding”
unambiguous
trivial to follow

Conclusion:
Across both tests, gpt‑4.1 on the https://api.openai.com/v1/responses endpoint does not treat explicit rules as binding — not in the system message, and not even when the rule is repeated directly in the user prompt.

This behavior persists even in a completely minimal environment with no competing context. It suggests that in this mode, the model prioritizes default conversational helpfulness over operational rule execution.

This has significant implications for anyone attempting to build:

rule‑driven agents
schema‑enforced workflows
tool‑first pipelines
deterministic routing
persona‑suppressed task modes

If anyone has observed different behavior with this model or endpoint — or has found a configuration where system‑level rules are enforced — I’d be very interested in comparing notes.

_j · January 25, 2026, 4:41pm

The problem is talking ambiguously about “the model”.

David_Brown2 · January 25, 2026, 4:45pm

Thanks for your response. I am wondering though if you are hitting the same endpoint as me. I will try your prompt and report back.

David_Brown2 · January 25, 2026, 4:55pm

I did try your instruction both embedded in system messages and appended to the prompt and I got more non-compliant results.

_j · January 25, 2026, 5:23pm

What software uses the parameters LLM Model LLM Endpoint? That is the likely fault.

Here is a Python script, using the Responses endpoint, and the “instructions” API parameter to place an initial guidance message, the same as were you to send another “system” message before “user” in the input.

"""documentation: httpx `OpenAI Responses` RESTful API call w instructions"""
import os
import httpx

payload = {
    "model": "gpt-4.1-2025-04-14",
    "instructions": "You are an AI assistant who can write a 3 word maximum response.",
    "input": [
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Paris is in what Region of France?"},
            ]
        }
    ],
    "tools": [],
    "text": {"format": {"type": "text"}},
    "temperature": 1,
    "top_p": 0.05,
    "stream": False,
    "max_output_tokens": 25,
    "store": False
}

try:
    with httpx.Client() as client:
        response = client.post(
            "https://api.openai.com/v1/responses",
            json=payload,
            headers={"Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}"},
            timeout=60
        )
        response.raise_for_status()
        assistant_text = "".join(
            content["text"]
            for output in response.json().get("output", [])
            if output.get("type") == "message"
            for content in output.get("content", [])
            if content.get("type") == "output_text" and "text" in content
        )
        print(assistant_text)
except httpx.HTTPStatusError as e:
    print(f"HTTP error: {e.response.status_code}")
    print(f"Error body: {e.response.text}")
    raise

The AI uses its three words to answer.

Île-de-France

GPT-4o behaves the same, except provided a period at the end of the sentence.

Capture the API request that is actually being sent to the model, and you will likely discover the issue.

David_Brown2 · January 25, 2026, 5:50pm

Thanks for your response. LLM_Endpoint and LLM_Model are just the attribute names from a config file. My software captures the messages array for request and response envelopes. The underlying JSON for the request is attached

:

_j · January 25, 2026, 6:06pm

I cannot explain what’s going wrong for you, except “your software”, or possibly some convolution of the input by using some other library like langchain.

Here’s a pretty call with the optional message type and less parameters. Still successful.

"""`OpenAI Responses` RESTful API call w system message, httpx library"""

import os, httpx

system = "You are an AI assistant who can write a 3 word maximum response."
user = "Is a shark a mammal?"

api_body = {
    "model": "gpt-4.1-2025-04-14",
    "input": [
        {
            "type": "message",
            "role": "system",
            "content": [
                {
                    "type": "input_text",
                    "text": system,
                },
            ],
        },
        {
            "type": "message",
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                     "text": user
                },
            ],
        },
    ],
    "top_p": 0.9,
    "max_output_tokens": 2000,
    "store": False,
}

try:
    with httpx.Client() as client:
        response = client.post(
            "https://api.openai.com/v1/responses",
            json=api_body,
            headers={"Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}"},
            timeout=300,
        )
        response.raise_for_status()
        assistant_text = "".join(
            content["text"]
            for output in response.json().get("output", [])
            if output.get("type") == "message"
            for content in output.get("content", [])
            if content.get("type") == "output_text" and "text" in content
        )
        print(assistant_text)
except httpx.HTTPStatusError as e:
    print(f"HTTP error: {e.response.status_code}")
    print(f"Error body: {e.response.text}")
    raise

The model meets the challenge:

No, it’s fish.

You can ensure there’s nothing gone wrong: like none of the files or directories in your execution path are named openai if you are using OpenAI’s provided SDK library, installed.

David_Brown2 · January 25, 2026, 6:35pm

Thanks for your response. I take the snapshot of the JSON just before it is sent out via the httpRequest. An error involving the transport layer would likely be a lot more serious than a Model ignoring instructions. Maybe my request is put into a sandbox environment because I am a new, tier 1 user. Such an environment might ignore anything in the request that looks like a system message.

_j · January 25, 2026, 6:48pm

Try the same on the platform site’s Playground.

https://platform.openai.com/chat/edit?models=gpt-4.1

Also in the site, you can choose between “Responses” and “Chat Completions” endpoints in the three-dot menu.

Not placing your messages based on role would be quite the violation of the API “contract”.

David_Brown2 · January 25, 2026, 7:27pm

Thanks for your response. Unfortunately, my app can’t use the playground. It may be that if a request comes from that URL, the system treats it with elevated privileges. I posted a snapshot of the JSON I sent in the request. It is not malformed and does respect the expected shape. If it didn’t, I should have gotten an error to that effect. I can have normal chat exchanges with the model using my app, which demonstrates there is no error in the way the request JSON is formatted.

_j · January 25, 2026, 8:11pm

If you are an API developer with an OpenAI organization, you will be able to use the platform site’s playground to make API calls. It is the same site where you generate API keys, and have billing and a credit balance. You will be able to see the conversation turns you provide being responded to appropriately.

If you do not have an API account, and are using someone else’s AI product or API, such that they are selling you a “gpt-4.1”, then there is nothing here for anyone to help you with, as you indeed may be treated as a “consumer”.

David_Brown2 · January 25, 2026, 8:18pm

After some reflection, I discovered an important difference between your outer envelope and mine. Your instructions were encoded in a separate instructions element. When I tried your exact envelope, except I changed the query to “What is the capital of France”, the AI was compliant. Thanks for your help!

Topic		Replies	Views
I am getting the same prompt back as the response Prompting gpt-4o-mini	6	295	June 24, 2025
Response API ignoring system message API responses-endpoint	4	770	March 17, 2025
GPT-4.1 not respecting single line break between bold title and paragraph Prompting gpt-4 , api , prompt-engineering	2	248	May 16, 2025
GPT-5 ignores explicit language instructions — responds in the wrong language despite strict prompt Prompting gpt-5	5	762	September 3, 2025
How to get GPT to reply strictly to prompt Prompting gpt-4 , chatgpt	6	1007	June 9, 2025

Minimal Test Shows gpt‑4.1 Ignores Explicit System‑Message Rules (Model: gpt‑4.1 via https://api.openai.com/v1/responses)

Related topics