Working with question answer pairs, and need to generate recommendation for users based on Q n A’s . I don’t want to send all the pairs to the LLM since there are so many questions and it will increase the cost for input tokens. Can I create embeddings for each question answer pair and then store them in a vector db. Then I can just ask in prompt to give recommendations based on the embeddings.
Is is possible/ feasible? any other solution.
The easy part here will be the vector database, where you have clear units of questions and answers that have semantic value, and they may closely mirror the style and size of other questions. Thus a search based upon an embeddings similarity search has almost solved itself for you, barring the implementation.
Then the question is how will you employ this ability to get one, twenty questions that are similar to something, and what is that something they will be matched against?
Embeddings is not a “prompt”; it the ability to score the similarity between two texts or other types of objects. An extensive search, obtaining cosine similarity scores of an input against the database will allow you to discover top ranked entries.
You could provide a “tool” internally for an AI to call, but you may want a more passive solution, or aren’t even asking about RAG but rather just a search for related items. Recommendations could take the form of “similar answers”, and even be based on inspecting the most recent user input and AI answer to see what else is “recommended” in a user interface.
So an embeddings based vector database is indeed ideal here, perhaps even more easily optimized than a document search. You just need to discover the pattern that actually fits the application you wish to present.