I was recently learning about advanced rag techniques:
Pre retrieval
Retrieval
Post Retrieval
I started learning about Vector Search Algorithm, Indexing.
I couldnt understnad what are retrievers for example Colbert how are they different from Vector Search Algorithm or a simple Vector Search.
How are techniques like HyDE, Hierarchical Document Indexing and Other Retrieval Techniques Different from the underlying vector search or these dense retrievers.
I can’t understand what parmeters can tuned.
I know that:
embedding model could be tuned
VSA(Vector Search Algorithm) can be tuned
Chunk Size
Can somebody help out.
Is the blog or site you would recommend to help clear my confusion
By the way this is also a great way to create an autocoder.
You create multiple graph representations of your code base e.g. extracting methods and classes and represent their connections in it e.g. share same context because they fullfill same purpose or similar etc…
Then you present the autocoder a task and it can find similar functionality to what you are trying to achieve. And then you can map that functionality by traversing through the graph and extracting all the code parts that are used for the task, give that to a model and ask it to write an interface for that and then create a factory using boilerplate code and letting AI adjust that and then create a service that implements the generated interface… etc and use static code analysis on the result and so on…
It is a cool way to keep the context small but still has everything it needs…
This should also work for long term memory if you also add scores and it could potentially fix the stupid loops ChatGPT always does like:
User: I want to do this
Assistant: Try A
User: does not work - got this error message
Assistant: Try B
User: does also not work
Assistant: Try A
User:
by over time scoring the generated result worse and worse until it automatically lands in the prompt as something like “I have already tried this: A, B, C - give me another solution”…
Let me attempt with HyDE since I employed it myself.
When you are doing retrieval, you typically take the user query (question, bunch of keywords, whatever), embed it (create a dense vector representation of it), and then use the embedding/vector to find another embedding/vector that is geometrically closest to it (those other vectors are pre-embedded, and come from various text chunks in your knowledgebase, database, or documents).
The issue however, is that most of the time, your query lacks context - something we call an “asymmetric” problem, in the sense that a query might be a simple question like “what is hyde”, whereas the text chunks in your knowledgebase are much larger and more contextualized.
So HyDE first generates a hypothetical answer, e.g. “hyde is a retrieval method used to improve the recall”. This now has a lot more context, because it now includes things like “retrieval method”, “recall”, “improve”, So when you embed that hypothetical answer, and use that as your query, you should, on average, get better results.
Self-reflection, along with producing a mind-model of the user and in what context they desire AI production, is what you can have the AI make to actually improve the quality.
(gpt-4o has poorer performance on prompted reasoning production, the length impacting the response length it wants to make, but the last depicted paragraph answers you)
We can bring this back on topic, by giving the case where the AI must reason: determine if the question has been directly answered within RAG results placed back into context, what it might do if it were to call tools offering more knowledge, or examine if a preliminary answer it can produce has any quality of actual pretrained knowledge on the topic.