System? Developer?
Each model supports a “system” role, or on later reasoning models, this has been demoted to a “developer” role, but still serves the same purpose (and you can send both particular names and have equivalence: the role name is rewritten on either model type by the API so you have success instead of an error such as “system” role not supported)
The system role has a precedence in the hierarchy of “respect” that the AI model has for it and the obedience in following its instructions. Generally, it is designed for the application developer to place behaviors and rules that the user input should not be able to override or countermand as the first message the AI receives.
Persistence
In a way, you do “send all again”, as the AI model that runs on the API does not have a memory or a persistent state - it must ingest all the prior turns you want understood as context. Every API run is disconnected.
Thus, on the Responses API or Chat Completions API, this system message will be the first message that you will pass, as positionally, the first message is also stronger in guiding the behavior. Then the turns of previous conversation exchanges are appended, and then finally, the latest user input that needs answering: a complete list of all the AI model must understand to be your application and “chat”.
Responses & Instructions
On the responses API, an additional parameter was added, called “instructions”. This field injects text as that first system message before any other input that you’d send. It has to be placed in every API call, otherwise the AI would run without instructions.
You can still ignore this API parameter field, and simply place “input” messages in order, such as [system, (user, assistant), user] for a followup on a previous question. Responses has added an optional truncation:auto parameter though, where the oldest messages of “input” can be dropped if there are more messages than the input of the AI model itself allows. That would include your important first message, also, so “instructions” is important if you are not managing your own input length of conversation and letting the API’s parameter do it (not great).
Just earlier today, I provided two simple scripts as replies, both showing an API call with “instructions” and later in that topic a script with a system message of instructions - both operating identically.
Conversations
You can see that a system message could be part of a conversation, perfectly acceptable…if you let the model fail if the input grows too long. This also applies to the newer “conversations API” - a server-side chat storage mechanism. Create a conversation ID, and you can pre-populate it with a system message, sent only once. Then this ID that you continue to supply when you use the responses API will grow a conversation starting with that message.
Still better is to be sending “instructions” every time you make an API call though, to be error-free. You will oftentimes find that applications need dynamic instructions depending on the state of a conversation or an application also, thus you can change that first message. The conversation ID saves you some bandwidth.
Prompts
A preset ID with instructions along with some settings. You must create it in the platform site UI. Then a prompt ID in your call instead of model and instructions fulfills the setup portion of an API call. The most “not sending everything every time”.
Conclusion
So: all these mechanisms are simply different ways of loading the context of messages that the AI model must have every time, and offered services and helpers and parameters if you don’t want to instead manage all of your customer’s chats yourself, have complete ownership of user data and nothing reliant or stored on OpenAI’s servers, and completely budget exactly what runs every turn (which is absolutely the best way).