Hey, sorry, I’m too marinated in my own context, so missed that my cryptic responses are not always clear.
If chunking done right and the elements you import into your RAG contain the IDs of the chunks (needed to be able to operate them effectively during the retrieval), your query to the vector DB will return objects with IDs (and other info you will be using to build your prompts). Here are some examples of my approach: Examples | SIMANTIKS if you look closer to the storable objects you’ll notice (beside the document UUID you have a path which acts as the address inside the document). The unique chunk id can be build using the doc UUID and chunk address/path. Sure your app may need a different approach, but after a couple of years in the field I came to what is there because it’s the most flexible minimum composition of context objects I found so far.
So once you run the query to vector DB, you get a list of those context objects sorted by relevance. Instead of using all objects from the list, I run them through a separate model to select the items I really need (preselection).
The prompt looks similar to:
Having the question and instructions from the user, evaluate whether the given except from the document contains the exact answer to the user's question, related information or other context somehow necessary to answer the question. Answer either 1 for yes or 0 for no.
Question: % question%
Instructions: % instructions%
Found except:
%excerpt%
Your answer (single digit only):
The answer is single character so you can easily map it in your code and verify the log probs for certainty.
Run in parallel on all found items and accept only those that were selected.
Build the prompt for answering model.
Get your answer.
Then validate the answer using similar approach, but this time include all the selected items at once. The prompt should be similar to this:
As the expert in the subject, please confirm the correctness of the answer below that was based on the provided context. Answer by either 1 for yes, or 0 for no
User query: % question%
Context:
%context%
Answer: %answer%
Do you confirm correctness of the answer? (reply by a single digit only):
Again, easy to parse and check log probs.
If all good, you continue your app logic with:
Take preselected items, query and the answer from the primary answering model and run it through a different model with a prompt similar to:
Having the context items and the answer to user's question, please select the IDs of the context items that contain the answer to the user query.
User query: %question%
Context items:
%item1%
%item2%
...etc.
Answer: %answer%
ID(s) (comma-separated list of IDs if multiple items formed the anser):
Item format is:
%field1%: %value1%
…etc.
ID: %id%.
This will give you one or more items to justify the context used by the model to form the answer (you can use those IDs in your code/display logic).
As you see, the auxiliary models have simple tasks agnostic to the data they operate with, so ready and easy to fine-tune for better performance without retraining on specific domain (unless the domain is very specific and lacks general knowledge about it).
And you have a bonus of log probs on single token answers for certainty estimations.