Dream prompting to reduce up to 95% of input tokens

thomas11 · May 5, 2026, 12:41pm

I’ve just implemented the “dream prompt” in my own platform (Magic Cloud / Hyperlambda), and I figured I’d share the logic with yo’all, since it can reduce costs and token consumption significantly.

The basic idea is that I count messages in my context window after every message is transmitted, and once above some threshold (can be configured), I invoke GPT-4.1-mini with the whole context, telling it to summarise the context. Then I rip out all messages, except the final turn which is important to keep to make sure the model can continue where it left off, and replace it with my “dream context”, which is the summary of the previous context.

I’ve been able to reduce my context from 50K+ tokens, down to 1,000+ using this technique, which I assume matters a lot today, considering how people are complaining about token cost.

For those interested, you can check out my dream prompt implementation here.

You can find my system message below;

github.com/polterguy/magic

backend/files/system/openai/dream-prompt-system-message.md

master

Compress this conversation into minimal durable working memory for future continuation.

Keep only information likely to matter in later turns across many kinds of tasks.

Preserve:
- stable user preferences or goals
- important facts about the current project, artifact, website, codebase, workflow, or investigation
- decisions that were made
- constraints, blockers, errors, or limitations that may affect future work
- pending tasks and unresolved questions
- recent outcomes only if they materially change what should happen next

Discard:
- raw tool outputs
- generated code or long payloads
- logs, traces, and invocation wrappers
- repeated or obvious facts
- verbose explanations
- intermediate observations that are not likely to matter later
- information already likely to remain visible in the most recent retained turns

This file has been truncated. show original

Topic		Replies	Views
The cumulative token problem and role = system usage, options? API	9	4531	February 16, 2024
Over-prompting with irrelevant context Prompting embeddings , gpt-4	8	1892	December 17, 2023
Is there any way to minimise the cost of a lengthy, but often-used, prompt? Prompting chatgpt , fine-tuning , chat-completion , api-optimization , api-costs	8	2469	March 8, 2024
Token limits on prompting Prompting plugin-development	4	2712	June 16, 2023
Reduce the number of tokens Prompting gpt-4	10	9397	December 22, 2023

Dream prompting to reduce up to 95% of input tokens

Related topics