I’m building a small multi-agent system where one agent acts as a Knowledge Agent, it should read PDFs, markdowns, or web links and then remember what it learned. Another “Main Agent” uses that understanding later for reasoning or onboarding questions.
I’m not looking for RAG or vector DB setups (no embeddings, no retrieval queries), and I also don’t want to fine-tune models each time knowledge changes.
Basically I want the Knowledge Agent to behave like a human who’s already read the docs using that info naturally when reasoning, not by searching.
I also considered just loading everything into the system prompt or summarizing all the documents into one knowledge.md file and feeding that as context but this doesn’t seem like a scalable or efficient approach. It might work for a few PDFs (4–10), but not for large or growing knowledge bases.
So I’m wondering if anyone has seen a framework or project that supports this kind of proactive, memory-based knowledge behavior?
GPT: generative pretrained transformer - tens or hundreds of millions of dollars of computation, reinforcement learning on terabytes of corpus.
Post-training: more millions of tasks and applications of training for how to perform and act like a chat entity and follow instructions and understand roles.
Delivers you an AI model to use.
The only thing that you have not disallowed is context window for a completion AI model. This is the auto-regressive sequence that the AI continues upon, predicting the next token that would appear after the text, guided by self-attention.
Therefore the only avenue you’ve provided yourself is that input, where “learning” will just be adding to the chat in an uninformed manner, whether in “chat history” form, or by persisting documents that are meant for ingesting, in whole, without search or keeping the context to only what is relevant.
The largest input context AI model by OpenAI is gpt-4.1 at 1M tokens. Eventually you must discard, and you have given no latitude for any mechanism of task relevance for what to place or to expire.
Maybe i’ll try smaller, specialized agent whose only job is to use RAG to find relevant info and then summarize/write a concise briefing for the Main Agent. This briefing will get passed as context. This moves it closer to “internalized knowledge” for the task at hand without the latency of searching on every token. If this doesnt work than the only option I’m left is start utilizing the context window.