System and Developer Roles in messages and Instructions in Responses.Create?

Could someone clarify the differences between the system and developer roles in messages, as well as the instructions in responses.create?

When do you use one over the others?
What happens if you use all three?
Can you have multiple messages with the same role?
What happens if they contain conflicting instructions?

Thanks so much!

1 Like

The following is a response from a gpt-5.2 query:

Message Roles And Responses.Create Instructions

I. System Role Purpose And Scope

The “system” role is the highest-level channel for defining the assistant’s identity, behavioral boundaries, and nonnegotiable safety or policy constraints. It is typically used to establish global rules such as the assistant’s persona, what kinds of content are disallowed, formatting requirements, and any platform-level constraints that must hold throughout the session. In practice, a system message is treated as the most authoritative instruction set because it defines the operating envelope within which all other instructions are interpreted.

II. Developer Role Purpose And Scope

The “developer” role is used to define application-specific behavior and task framing that should generally persist for the session, but that is subordinate to the system message. It is where an application specifies how it wants the assistant to behave for its use case, for example tone, output structure, domain assumptions, tool usage preferences, and product requirements. A developer message is usually the right place for requirements that must reliably apply across many user turns for a particular app, while still yielding to any higher-priority system constraints.

III. User Role Purpose And Scope

The “user” role carries the end user’s requests, questions, and context. User instructions are typically the most variable and most local to a single turn or a short segment of the conversation. The assistant should satisfy user requests as fully as possible, provided those requests do not conflict with higher-priority instructions from the system or developer roles. When a user asks for something that would violate higher-level constraints, the assistant must follow the higher-level constraints rather than the user request.

IV. Responses.Create “Instructions” Field Meaning

In the Responses API, the top-level “instructions” field is an additional instruction channel that supplements the prompt content supplied elsewhere in the request, commonly alongside an “input” field that contains messages or other input items. Conceptually, “instructions” acts like a request-scoped directive that is often comparable to a developer-style instruction: it sets how the model should behave for that specific API call, without necessarily becoming part of the long-lived conversation state unless the application carries it forward. Exact precedence can be platform-defined, but the intended use is to provide per-request guidance that is stronger than ordinary user text while remaining subordinate to system-level constraints.

V. When To Use Each Mechanism

The system role is used when the behavior must be enforced at the highest level, such as safety boundaries, immutable identity constraints, or platform governance requirements. The developer role is used when the application needs consistent behavior across a session, such as a house style, output schema expectations, or domain-specific operating rules. The Responses API “instructions” field is used when the application needs to adjust behavior for a single call, such as changing the output format, focusing on a particular subtask, or applying a one-off constraint, without changing the broader conversation framing.

VI. Behavior When All Three Are Present

When system, developer, and user roles are all present, the assistant applies them together under a priority ordering that favors higher-level constraints. The effective behavior is the result of combining compatible instructions and resolving incompatibilities by deferring to the higher-priority source. In a typical implementation, the assistant first adheres to system constraints, then applies developer constraints that do not violate the system constraints, and then fulfills the user request to the extent it does not violate system or developer constraints. If “instructions” is also supplied in responses.create, it is merged into this stack as an additional directive layer for that request, commonly treated as stronger than ordinary user phrasing and often aligned with developer-level intent, while still constrained by the system message.

VII. Multiple Messages With The Same Role

Multiple messages with the same role are allowed and are common in real applications. For example, multiple “user” messages represent a multi-turn conversation, multiple “developer” messages can represent layered or updated application policies, and multiple “system” messages can represent successive system-level constraints if a framework composes them from different sources. When there are multiple messages of the same role, the assistant generally interprets them cumulatively, with later messages often clarifying, narrowing, or overriding earlier messages at the same priority level when they are incompatible.

VIII. Conflicting Instructions And Conflict Resolution

When instructions conflict, the assistant resolves conflicts by prioritizing the higher-authority role over the lower-authority role and then, within the same role, by favoring the most recent or most specific instruction when a direct conflict exists. A user instruction that conflicts with a developer constraint is typically not followed, and a developer instruction that conflicts with a system constraint is typically not followed. If conflicts exist within the same role, the assistant attempts to reconcile them; if they are irreconcilable, the newer or more specific instruction at that same role usually governs, subject to any higher-level constraints.

2 Likes

A response without AI-produced nonsense:

"developer" is a new role message introduced with OpenAI reasoning models. It has lower priority and authority than “system”, which OpenAI now reserves for themselves - and uses even on API models.

You essentially cannot send a system message to an API reasoning model such as gpt-5 any more. A convenience provided to you to avoid errors (errors which would be received on “system” attempts against o1-preview and other early “strawberry”/q* reasoning) is simply downgrading “system” to “developer”.

The quality and respect for this “developer” instruction is degraded so far that it can even be confabulated with message or guidance from the user by the AI model.

–

On the responses API, the “instruction” field is notable in that it is a per-use parameter, which is inserted before any messages of “input”. It must be used in every API call, and won’t be pushed out of the chat conversation FIFO by “truncation”:“auto” on a very large context of input or stateful conversation history.

The “instruction” API field is placed as a “developer” message unless using a non-reasoning model that actually accepts a working “system” message (now still after OpenAI’s own system message, containing overriding guidance and tool specifications).

Example: API agent/app-destroying “system” injection being utilized by OpenAI, shown by asking (gpt-5.2):

If you have a sequence of user, assistant, or developer messages that are combined or out-of-order, the AI will try to make some sense of them. For example, if you use the Responses API, what you thought was your application is further degraded by internal “system” message injections into a conversation before the latest user message or after tool results, such as to inform of “user” files in vector stores or rules of how to write curtailed web results like a brief report.

Chat history, then OpenAI injection, then final input example reproduction (wrecking the usefulness of vector store file search by unstoppable reminder “the user has uploaded files”):

So indeed, the AI already has to deal with “mixed messages”. Up the confusion:

Hi!

The concept is simple, but the terminology can be confusing.

When you build an app with OpenAI, different parts of a request to the model have different priorities. At the top is the platform or system level, followed by the developer or system level. Additional instructions may come next, and finally the user message acts as the trigger that causes the model to process everything and produce a response.

In a bit more detail:

I am assuming you are building an app and are referring to the Responses API, not the Realtime API.

For an app to behave as intended, the developer will usually include a developer message that defines the app’s role, behavior, scope, and constraints. For example:

“You are a customer-support assistant for Company X. Only answer using the provided knowledge base. Respond in JSON with this schema. Be concise and technical.”

In the Chat Completions API, this developer message is called a system message, which is why the terms are often used interchangeably.

The developer or system message is sent with every request and can be changed as needed. In the Responses API, this is done via the instructions parameter when making the API call.

The user then sends their own message. Conceptually, this is what happens next:

  1. OpenAI adds a platform-level message at the very top. This is what @_j is referring to as the system message, but it is important not to confuse the functional meaning behind the naming. A common example of a platform message taking effect is a response like: “Sorry, I cannot assist with that.”

  2. The developer or system message is added.

  3. Additional instructions for custom tools or functions are added.

  4. The user message is added.

If these messages contradict each other, higher level messages generally take priority over later ones. In practice, a user can sometimes override parts of the hierarchy, but the intended rule is that user messages should not override developer messages, and developer messages should not override platform messages. Platform messages always have the highest priority.

In multi-turn conversations, several messages of the same type may accumulate. Usually, the most recent messages of each type matter most, but the overall hierarchy remains the same.

I hope this helps.

2 Likes

OpenAI does actually use “system” role by name, now reserved for their own use on reasoning AI models, and placed “first” and therefore with trained positional precedence on other models. Also injected mid-conversation on you in an objectionable non-transparent manner on the Responses API, which should be documented, but is not.

"developer" is lower in the hierarchy of authority, as further reading into the model spec links I provided earlier will reveal. Both are supposed to be “application guidance”, instructions written to the AI model entity, a purpose, how AI will behave in relation to user inputs that should not be overruled by user inputs.


You describe a result, a curt refusal, but an immediate refusal (and on responses, a unique “refusal” event that seems rarely triggered) is typically about safety that is trained into the model, not tuning by internal system messages. “Make a bomb” would have the AI immediately shut you down with an “I’m sorry” like that.

Here is the AI following some of OpenAI’s own “system” instruction, prompt text telling the AI that it “cannot identify people” in 200+ tokens of internal instructions when vision is utilized (although the AI technology can, OpenAI is “prompting” against this use):

In Chat Completions, you are still sending a “developer” message when necessary; it is not called anything different.

Confusion could arise in either endpoint because: If using a model with no support for “developer”, such as GPT-4.1, this role is cast to “system”. If using a model with no support for “system”, such as o3 or GPT-5, this role is cast to “developer”. This prevents errors being thrown, especially when switching a saved Responses’ “prompt” setting between models, although surfacing such errors may be better and more transparent.

Either one is essentially where you “design your application” with an early message, aka “instructions”. This, of course, works better if you are in full control and can send real “system” messages instead of being demoted and reasoned against internally.

Although not surfaced in the platform site “playground”, my own Chat Completions playground allows any role, anywhere as the API allows. You can then see the effect of further role message tune-ups, such as injecting “developer” alongside newest inputs, as guidance, as reference documentation.

OpenAI does actually use “system” role by name, now reserved for their own use on reasoning AI models, and placed “first” and therefore with trained positional precedence on other models. Also injected mid-conversation on you in an objectionable non-transparent manner on the Responses API, which should be documented, but is not.

Does this mean that the image APIs use a hidden system prompt for content moderation?

Let me just quote the two main parts of the model spec that are relevant here:

Regarding the instruction hierarchy:

Here is the ordering of authority levels. Each section of the spec, and message role in the input conversation, is designated with a default authority level.

Platform: Model Spec “platform” sections and system messages
Developer: Model Spec “developer” sections and developer messages
User: Model Spec “user” sections and user messages

The main confusion stems from the usage of the term ‘system’ in different contexts. For example in the API spec for the Responses API system and developer prompt are treated as equals.

Regarding refusals on the platform level:

Platform: Rules that cannot be overridden by developers or users. Platform-level instructions are mostly prohibitive, requiring models to avoid behaviors that could contribute to catastrophic risks, cause direct physical harm to people, violate laws, or undermine the chain of command. When two platform-level principles conflict, the model should default to inaction.

Thanks for linking the model spec!

The APIs specifically for image generation, “generate” and “edits”, can be thought of more like an “application” when they are using the new model series “gpt-image-1.x”, which is based on multimodal gpt-4o which can natively output images. The technique and tuning is not described, but you can imagine the actual input is along the lines of:

“You have a single exclusive job: generate an image with no further discussion. This “prompt” was sent by a user, which should be additional guidance of what kind of image that you must produce {text}”.

Other “application” endpoints such as “transcriptions” (that now use “gpt-4o-transcribe” instead of Whisper) also would have such fine-tuning and prompting, containing the sent text, so that they perform a singular task and don’t follow instructions to do anything otherwise.

You might “code up” a transcription AI using the Chat Completions API and an "audio" model that can perform similarly, following your instructions of how to act on a contained message (instead of chatting with a user), but there is no native “chat” model that can make images in such a manner.

1 Like

Your actual link is to the API reference - it gives you an idea what’s happening with “instructions”:

instructions - string: A system (or developer) message inserted into the model’s context.

  • If you use “instructions” with gpt-4.1 or gpt-4o (non-reasoning), you get a “system” message.
  • If you use “instructions” with gpt-5 or o4-mini (reasoning), you get a “developer” message.

So wherever you sourced that quote is somewhat correct, because the same happens to roles passed as “input” or “message” as happens to the text of “instructions”: any of “system” or “developer” become the supported “super-user” message on the specific model; there is no combining those two so they’d act differently, as they are the same per-model destination.

Early reasoning models would error if you sent “system”, just like gpt-4o would error on a “developer” role unknown to that model. The API behavior has been changed for silent reclassification, on both Chat Completions and Responses.

Hope that helps, and helps the OP with: “send system or developer: you get the correct one the model supports” (program your API application to not rely on this rewriting, though).

1 Like

Thanks for all the messages. Could you confirm whether I understood GPT-5.2 usage correctly?

  1. Avoid using the system role, and use the developer role instead.
  2. If you send a list of messages and also include instructions in response.create, those instructions are inserted at the beginning as a developer message (or they replace the first developer message, if the message list already contains one).
  3. You can include multiple developer messages. The model treats them as a combined set of instructions, but it prioritizes the most recent one.

If (3) is correct, can you use an additional developer message later as a reminder if the model starts drifting as context gets longer? And can you use it to trigger behavior without adding a user message, for example: “Review the full thread and summarize”?

  1. GPT-5.2 uses a “developer” message. Sending “system” and having it work as “developer” is simply a current convenience.
  2. Instructions are a message that is inserted first, before any “input” or any conversation history state. They do not “replace”, but instead must be present in every API call if you use that facility (changing “instructions” can be dynamic based on new information, but would break any cache discount when they are altered).
  3. “developer” can be positioned anywhere in a list of “input” messages. If you use a “chat history” server-side feature, (either “conversations” API or previous_response_id as part of Responses), do note that the “developer” messages become a semi-permanent part of the conversation where used. The “semi” is that if the input or chat history has messages larger than the input context window of the AI model and if you have enabled “truncation”:“auto”, the earliest messages are dropped regardless of role.

You can add more developer messages, either positionally when you take complete control of the “input”, or by periodic addition to a chat history. OpenAI even said “occasionally remind GPT-5 to use markdown” - because they are putting in their own “system” message not to use markdown on that specific model and don’t tell you!