Looking for something that supports true “knowledge memory” (not RAG, embeddings, or fine-tuning)

Mushaf_Sibtain · November 12, 2025, 3:48pm

Hey Everyone,

I’m building a small multi-agent system where one agent acts as a Knowledge Agent, it should read PDFs, markdowns, or web links and then remember what it learned. Another “Main Agent” uses that understanding later for reasoning or onboarding questions.

I’m not looking for RAG or vector DB setups (no embeddings, no retrieval queries), and I also don’t want to fine-tune models each time knowledge changes.

Basically I want the Knowledge Agent to behave like a human who’s already read the docs using that info naturally when reasoning, not by searching.

I also considered just loading everything into the system prompt or summarizing all the documents into one knowledge.md file and feeding that as context but this doesn’t seem like a scalable or efficient approach. It might work for a few PDFs (4–10), but not for large or growing knowledge bases.

So I’m wondering if anyone has seen a framework or project that supports this kind of proactive, memory-based knowledge behavior?

Would love to hear what others have tried

_j · November 12, 2025, 4:11pm

How about, “no such thing”.

GPT: generative pretrained transformer - tens or hundreds of millions of dollars of computation, reinforcement learning on terabytes of corpus.

Post-training: more millions of tasks and applications of training for how to perform and act like a chat entity and follow instructions and understand roles.

Delivers you an AI model to use.

The only thing that you have not disallowed is context window for a completion AI model. This is the auto-regressive sequence that the AI continues upon, predicting the next token that would appear after the text, guided by self-attention.

Therefore the only avenue you’ve provided yourself is that input, where “learning” will just be adding to the chat in an uninformed manner, whether in “chat history” form, or by persisting documents that are meant for ingesting, in whole, without search or keeping the context to only what is relevant.

The largest input context AI model by OpenAI is gpt-4.1 at 1M tokens. Eventually you must discard, and you have given no latitude for any mechanism of task relevance for what to place or to expire.

Mushaf_Sibtain · November 12, 2025, 4:42pm

Thanks mate.

Maybe i’ll try smaller, specialized agent whose only job is to use RAG to find relevant info and then summarize/write a concise briefing for the Main Agent. This briefing will get passed as context. This moves it closer to “internalized knowledge” for the task at hand without the latency of searching on every token. If this doesnt work than the only option I’m left is start utilizing the context window.

tairona_nice · December 29, 2025, 3:29am

Hello,

Did you tried Elastic search(opensearch) ? You can use NVMe to store your JSON file, then using full text search, Elastic serach is very fast.

Its latency is lower then 30 ms ..

Topic		Replies	Views
Add longterm memory per user using OPENAI AGENTS SDK API api	1	1520	July 26, 2025
Strategy to Train Model using R.A.G API training	5	2791	October 18, 2023
Knowledge Base in realtime api API openai , realtime , api-realtime	3	1185	January 22, 2025
How to make an agent actually learn? API agents , assistants	16	4593	September 5, 2024
How to Add Knowledge Base in API API api	12	25784	December 15, 2023

Looking for something that supports true “knowledge memory” (not RAG, embeddings, or fine-tuning)

Related topics