Solving agent system prompt drift in long sessions — a 300-token fix

The problem

If you’ve run any LLM agent for 30+ minutes, you’ve seen this: the agent follows its system prompt perfectly at the start, then gradually drifts. An hour in — it acts like the prompt never existed.

This happens with every model, every framework, every agent. It’s not a bug — it’s how attention works in transformers. The system prompt is tokens at the beginning of context. As context grows, those tokens
lose weight. 1,000 prompt tokens out of 2,000 total = 50% attention. 1,000 out of 80,000 = ~1%.

What doesn’t work well

  • Repeating the prompt every N messages — eats context window (2,000+ tokens each time), and passive re-reading is weaker than active generation
  • Restarting the session — kills accumulated context, unacceptable for agents mid-task
  • Summarization / memory layers — help with information recall, but don’t restore attention to instructions and rules

What works: SCAN

Make the model generate tokens semantically linked to its instructions. Not re-read them — generate new ones by answering questions about them. Generation creates ~20 tokens that actively link instructions
to the current task. Prompt repetition inserts 2,000+ tokens the model passively skims.

How it works

  1. Markers — questions at the end of each section in the system prompt:
Section: data handling rules

…your rules here…
@@SCAN_1: What data will this task affect? What if state is stale?

Section: error handling

…your rules here…
@@SCAN_2: What’s the most likely failure mode for this task?

Markers at the end — to answer the question, the model must read the section first.

  1. Trigger — before a task, the agent answers its markers:

SCAN_1: Task affects session state. If stale — double charge.
SCAN_2: Timeout on external API without retry logic.

1-2 sentences per marker. ~300 tokens total vs 2,000+ for prompt repetition.

  1. Post-task check:

CHECK: session reset ✓, error codes ✓
MISSED: didn’t verify concurrent requests — acceptable, single-threaded task

  1. Levels — FULL (~300 tokens, all markers) for critical tasks. MINI (~120 tokens, key markers) for medium. ANCHOR (~20 tokens, one line) between subtasks. SKIP for trivial ops.

Key constraint: SCAN answers must be in the model’s output, not in internal thinking/reasoning. Token generation in output is what restores attention.

Multi-agent systems

Each agent in a pipeline runs SCAN independently and returns CHECK/MISSED to the orchestrator. Without this, a sub-agent loses all instruction context by the time it finishes. The orchestrator sees what was
verified across the entire chain.

What this addresses beyond drift

  • Prompt injection defense — safety instructions with maintained attention weight can’t be outweighed by attacker tokens
  • Tool calling accuracy — API schemas decay like everything else, a marker keeps them alive
  • Multi-agent coordination — CHECK/MISSED creates visibility into what each agent actually verified

My experience

I use this daily with 11 agents, 100K+ context, 7 markers. Cost is under 0.5% of total tokens. Without SCAN — agents reliably lose critical rules by mid-session. With SCAN — stable across entire session
length.

I’m not selling anything — the method is open, adapt it however you want. If you try it, I’d love to hear what works and what doesn’t.

Full writeup with detailed technical explanation, multi-agent propagation protocol, and complete prompt templates:

https://gist.github.com/sigalovskinick/c6c88f235dc85be9ae40c4737538e8c6

This is a thoughtful approach to a real problem. A few reactions:

The core insight — that generating tokens is more effective than passively reading them — is well-grounded. Active recall outperforms passive review, and the token-weight framing of why prompts decay is a reasonable mental model, even if the underlying mechanics are more complex than pure attention dilution.

A few things worth considering as you develop this further:

The “generation restores attention” claim needs more rigorous testing. The mechanism you’re describing (output tokens creating semantic links back to instructions) isn’t a well-established property of transformer inference — it’s an interesting hypothesis. It’s plausible that forced generation simply ensures the model has processed relevant instruction sections before acting, which is a more mundane but still valuable explanation.

Marker placement matters a lot. You note markers go at the end of sections so the model must read the section to answer — but in practice, models often answer questions about earlier content without faithfully re-processing it. Testing whether your agents actually maintain instruction fidelity vs. just appearing to (producing plausible-sounding SCAN answers while still drifting behaviorally) would strengthen the methodology.

The 0.5% cost overhead figure is compelling if it holds. That’s the kind of number that makes adoption easy to justify. Would be curious whether that holds at higher marker counts or with models that are more verbose in their SCAN responses.

For the multi-agent case, the CHECK/MISSED propagation to an orchestrator is the most practically useful part of this to me — it creates an auditable trace of what each agent believed it was constrained by, which is valuable beyond just drift mitigation.

Have you tested this against models with explicit instruction-following tuning (like instruction-tuned vs. base models), or across different context window sizes to see if the benefit scales as expected?

1 Like

You’re raising the right questions.

On the mechanism — honestly, doesn’t matter what happens inside. I don’t have access to model internals and can’t prove whether it shifts attention weights or just forces re-reading. But if the model re-reads its instructions on trigger and produces better output — that’s exactly what I needed. The function is fulfilled either way. Whether it’s attention restoration or forced re-reading is an interesting theoretical question, but from a practitioner’s standpoint the result is the same. Maybe even better if it’s just re-reading — simpler explanation, same effect.

On faking SCAN answers — the specificity of the question prevents this. A vague “what should you be careful about?” gets a generic answer, sure. But “which 3 rules from the list above are easiest to violate on THIS task?” produces exact rules 1:1 from the prompt. You can verify immediately whether the model actually read the section.

On cost scaling — 7 markers at ~300 tokens is the sweet spot. More markers just adds noise. A well-structured prompt shouldn’t need more than 5-7 sections anyway.

On CHECK/MISSED — agree, that’s where the practical value really shows in multi-agent setups. Even without the attention theory, having each agent explicitly state what it checked and what it skipped is useful on its own.

Tested with Claude and Kimi, long context (100K+) is exactly where this shines — that’s what it was built for.