There are many options to relate the question to your data. The main intuition is you want to maximize the semantic meaning of the question to the semantic meaning sitting in your database.
So a popular, and straightforward, thing to try first is embed the question, and correlate this vector with the vectors in your database. You feed this to the LLM as the “Context” and ask the LLM to answer the question only using this “Context” or else respond “I don’t know”.
But sometimes the question doesn’t line up with your data, it could be for various reasons.
There are a few more tricks to get it to line up the two.
One is to use HyDE, which basically asks the LLM to answer the question, so the answer is just made up, and you don’t want to send that back to the user. But what you do, is take this hypothetical answer, and embed this, and correlate it with your vectors. This in theory should transform the question into something more closely matched in your database, but only for “open domain” sets of knowledge the LLM was likely trained on, so not super secret crazy stuff the LLM has never seen.
Another approach that I like is using keywords and embeddings. This is more involved since you need to created your own “rarity index” of the words in your data set. You then split the incoming text into its component words, with their frequency, and overlap this with your data, and matching words, and their frequencies. This is similar to the BM25 algorithm, but more scalable IMO.
Anyway, after you get the keyword stream ranking, you also get the embedding stream ranking, and you fuse these two rankings into a single ranking using harmonic sums. This is called Reciprocal Rank Fusion. Now you pull the text chunks matching the highest fused ranking between keywords and embeddings.
All this can be done locally or in the cloud without fancy services, but you have to be an algorithm person if you want to do it yourself.
But going back to your original question. Think about the incoming queries, and what transformations (if any) are needed to match your data the best. Do you need HyDE? Do you need to maybe translate your data to be more similar to the expected questions? Do you need to add a keyword functionality as well?
Then you correlate (in memory for speed) and get your topK embedding matches (and also topK keyword matches if you need that) and then you feed this into the next stage, which is a prompt for the AI to carefully answer the question from, or refuse to answer if it thinks the question is out of scope.