Hi everyone,
I’m building a multi-turn RAG-based chatbot using the OpenAI chat.completions API, and I’m trying to determine the best strategy for handling the system prompt in a long-running conversation.
I use a detailed system prompt (roughly 400 tokens) that defines the assistant’s tone, formatting requirements (e.g., citation style, markdown usage), and response rules. For each user turn, I also inject:
- A retrieved context from a vector store ,
- The user’s actual question.
My core question is:
When structuring the messages array, should I:
Option A:
Send the system prompt once at the start, and then continue appending user/assistant messages?
python
CopyEdit
messages = [
{"role": "system", "content": " You are a helpful AI, a legal search assistant with citation, markdown, tone, and style constraints ... [full system prompt]"},
{"role": "user", "content":"Context:\n[Relevant law summaries]\n\nQuery:\nWhat are the constitutional limits on searches?"},
{"role": "model", "content": "...[response with citations][1][2].."},
{"role": "user", "content": "Context:\n[New retrieved excerpts]\n\nQuery:\nHow does the exclusionary rule apply in cases involving good faith exceptions under the Fourth Amendment?"},
{"role": "model", "content": "...[response][3]."}
]
Or Option B:
Resend the system prompt before each user query, like this:
messages = [
{"role": "system", "content": " You are a helpful AI, a legal search assistant with citation, markdown, tone, and style constraints ... [full system prompt]"},
{"role": "user", "content":"Context:\n[Relevant law summaries]\n\nQuery:\nWhat are the constitutional limits on searches?"},
{"role": "model", "content": "...[response with citations][1][2].."},
{"role": "system", "content": " You are a helpful AI, a legal search assistant with citation, markdown, tone, and style constraints ... [full system prompt]"},
{"role": "user", "content": "Context:\n[New retrieved excerpts]\n\nQuery:\nHow does the exclusionary rule apply in cases involving good faith exceptions under the Fourth Amendment?"},
{"role": "model", "content": "...[response][3]."}
]
I’m particularly concerned with:
- Best practices around context window usage and efficiency,
- Whether repeating the long system prompt is necessary for maintaining behavior consistency,
- And how the model handles system role memory across turns.
Any insights, links to cookbook or recommendations would be greatly appreciated! Thanks in advance.