I am using R.A.G. to deliver a an AI powered Q&A system. I have an embedded document store, and I run my end-user queries against it using cosine similarity.
I also maintain a log of all questions and answers.
Now, I’m trying to figure out how to take advantage of this knowledge base of real user questions and real user answers. My first inclination was to use them for fine-tuning, but if my goal is to train the model to be smarter, that is, increasing it’s capability to effectively and efficiently answer questions, that doesn’t seem like the way to do it. I need to be able to add this information to it’s knowledge store. But how?
If OpenAI announces this week the new capability of the models to “remember” conversations, that’s going to be an immediate resolution.
But, today, since the model only knows what you give it in the current prompt, how can we leverage both the stored documents AND accumulated knowledge to make it smarter at answering questions?
One thought was to simply embed the accumulated knowledge along with the static documents, and let the vector search retrieve the best answers.
Another thought is to create a separate vector store, and send the query first to the accumulated knowledge base, and if it can’t be answered there, then search it against the static documents.
I don’t know. I’m curious to know if anyone else out there has tried something similar?
One thing you could (and probably should) do is switch to hybrid search.
A next step being that might be to fine-tune a model for keyword generation.
Then, when a user asks a question you’d send it to your keyword generator and use the generated keywords to filter your data store so you’re only checking similarity against relevant embeddings.
Beyond that, you might make the leap to HyDE.
Beyond, beyond that, you might take any exchanges which required more than one query for the user to get their answer and fine-tune a model that takes a user query and “upscales” it to a better form of their query. Basically, “If this is their initial question, and this is the answer, what should they have asked?”
If you store user questions and user answers, isn’t that just new static documents, that you can add to your static document vector store to help with the augmentation?
Why would you keep a separate document store for general information?
If you have, like, billions of documents, vector index optimization might take a while, so you might want to make this a daily or weekly job (and then if you need lower latency, but a look-aside smaller store on the side) but that doesn’t sound like your situation.
We do something much like that (with a human in the loop) and it works pretty well.
Additionally, we do use a separate document store per-customer, for customer-private documents, but that’s because we’re a multi-tenant service where customers expect their documents to stay theirs, and not leak to others. We score a query against both stores, and interleave matches based on respective cosine similarity. Because we use an exact index, this works well, and gives the same answers as a single unified store would.
There’s this method that can store information in hypertext and then using a transfer protocol, retrieve the documents of interest on-demand for an intelligence that wished to obtain the hierarchical knowledge within…
(In actuality, Gopher with Jugtail search is better suited to AI exploration of local documents, but the AI probably knows less than you about navigating it)