I’ve just implemented the “dream prompt” in my own platform (Magic Cloud / Hyperlambda), and I figured I’d share the logic with yo’all, since it can reduce costs and token consumption significantly.
The basic idea is that I count messages in my context window after every message is transmitted, and once above some threshold (can be configured), I invoke GPT-4.1-mini with the whole context, telling it to summarise the context. Then I rip out all messages, except the final turn which is important to keep to make sure the model can continue where it left off, and replace it with my “dream context”, which is the summary of the previous context.
I’ve been able to reduce my context from 50K+ tokens, down to 1,000+ using this technique, which I assume matters a lot today, considering how people are complaining about token cost.
For those interested, you can check out my dream prompt implementation here.
You can find my system message below;