What is the proper roadmap for learning rag?

I was recently learning about advanced rag techniques:
Pre retrieval
Retrieval
Post Retrieval

I started learning about Vector Search Algorithm, Indexing.
I couldnt understnad what are retrievers for example Colbert how are they different from Vector Search Algorithm or a simple Vector Search.
How are techniques like HyDE, Hierarchical Document Indexing and Other Retrieval Techniques Different from the underlying vector search or these dense retrievers.
I can’t understand what parmeters can tuned.
I know that:
embedding model could be tuned
VSA(Vector Search Algorithm) can be tuned
Chunk Size

Can somebody help out.
Is the blog or site you would recommend to help clear my confusion

3 Likes

ColBERT can be understood like this:

5 + 2 = 7

and

3 + 4 = 7

On a pure character (or word) level the 5 + 2 and 3 + 4 have no similarity except for the +

But finding a similar context (have same result) is also semantically close in many cases.

ColBERT looks for similarities of the context in which token are used…

Made something similar using graphdb representation.

The image shows a "Systemic Functional Linguistics Graph" interface with a large empty area labeled "Ideational Metafunction Assignment" and a text input box at the bottom left containing the words "gib m." (Captioned by AI)

Imagine you create a subgraph for the incoming doc and label that. E.g. that is a CV or an Invoice…

When you have a lot of such subgraphs you can find similarities in them and compare this with new incoming document’s subgraphs.

I am using a graphdb so I can add multiple kinds of subgraphs for multiple ways of finding them.

Ah and welcome to the developer community. Great first question. :tada:

2 Likes

By the way this is also a great way to create an autocoder.

You create multiple graph representations of your code base e.g. extracting methods and classes and represent their connections in it e.g. share same context because they fullfill same purpose or similar etc…

Then you present the autocoder a task and it can find similar functionality to what you are trying to achieve. And then you can map that functionality by traversing through the graph and extracting all the code parts that are used for the task, give that to a model and ask it to write an interface for that and then create a factory using boilerplate code and letting AI adjust that and then create a service that implements the generated interface… etc and use static code analysis on the result and so on…

It is a cool way to keep the context small but still has everything it needs…

This should also work for long term memory if you also add scores and it could potentially fix the stupid loops ChatGPT always does like:

User: I want to do this

Assistant: Try A

User: does not work - got this error message

Assistant: Try B

User: does also not work

Assistant: Try A

User: :japanese_ogre: :poop:

by over time scoring the generated result worse and worse until it automatically lands in the prompt as something like “I have already tried this: A, B, C - give me another solution”…

1 Like

Hi @ahmedmuhammadsiddiq and welcome to the community!

Lots of topics and things to learn for sure!

Let me attempt with HyDE since I employed it myself.

When you are doing retrieval, you typically take the user query (question, bunch of keywords, whatever), embed it (create a dense vector representation of it), and then use the embedding/vector to find another embedding/vector that is geometrically closest to it (those other vectors are pre-embedded, and come from various text chunks in your knowledgebase, database, or documents).

The issue however, is that most of the time, your query lacks context - something we call an “asymmetric” problem, in the sense that a query might be a simple question like “what is hyde”, whereas the text chunks in your knowledgebase are much larger and more contextualized.

So HyDE first generates a hypothetical answer, e.g. “hyde is a retrieval method used to improve the recall”. This now has a lot more context, because it now includes things like “retrieval method”, “recall”, “improve”, So when you embed that hypothetical answer, and use that as your query, you should, on average, get better results.

2 Likes

I’d love to see the model answer with a question instead.

Let’s do an example for that as well:


Is weight important when we throw two similar sized objects down a hole and want to calculate their acceleration?


Something like “Depends, are we doing this in a vacuum?”

Would be a far better answer than “No”.

1 Like

Self-reflection, along with producing a mind-model of the user and in what context they desire AI production, is what you can have the AI make to actually improve the quality.

You didn’t consider the pressure gradient down a hole, but the AI did…

3 Likes

That is exactly the kind of output you need for learning… but not for getting shit done.

I don’t even want that the model starts thinking like a nerd unless I tell it to act like one.

When I ask what falls faster a feather or some lead of the same size of a feather I primeraly want it to say.

“of course the piece of lead”

because the hole can’t have a vacuum inside unless I have a weired physics experiment…

Then again there are cultural difference.

An old english gentleman might want to have a meaningless conversation before coming up with a “by the way, it would be nice if you could…” prompt…

Where for me I prefer boolean answers and pure code…

What you see in my screenshot is not any output, it is within a “reasoning” container you can strip. I didn’t show the actual answer.

Follow up your question with “No chatting me up, just answer!” and reasoning actually ensures what you wish to receive.

(gpt-4o has poorer performance on prompted reasoning production, the length impacting the response length it wants to make, but the last depicted paragraph answers you)


We can bring this back on topic, by giving the case where the AI must reason: determine if the question has been directly answered within RAG results placed back into context, what it might do if it were to call tools offering more knowledge, or examine if a preliminary answer it can produce has any quality of actual pretrained knowledge on the topic.

2 Likes

Yeah, sure… that’s the plan…

And then comes the requirements agent that asks the customer how exactly they want their service / product / software…

Building the context (based on users preferences) is key to produce quality outputs.

But depending on context we also use different agents to collect context.

Hi,
You can look into the following for a few roadmaps to learn RAG: