We are implementing RAG in our organization. The basic part was easy, but the part that seems harder is user intent detection. For instance, the user might ask a question and then subsequently request the answer to be shortened. The latter case should not trigger a Vector search.
Likewise, when multiple documents are involved, how do you select one or the other? What if the user wants to contrast things between two documents?
Would love to hear from folks who have implemented detailed RAG which handles these things and what sort of prompts are used.
Additionally, is there a way to capture the prompts built or sent to the backend by the Assistants API?
Thanks!
1 Like
For the first question, you could keep history messages and feed them to API as history context, but I have no idea on how to avoid vector search in this case. For example, if the query is “shorten the above text”, it normally will not retrieve any data through vector search, then the GPT could know how to handle this.
Regarding the second question, you need separate documents in chucks. When you conduct a vector search and then sort the results by consine similarity, you may get the chucks from different documents. Then you may combine them together as context for further generation by LLM. That’s it.
Let me know if you have further questions. Good luck!
The issue with vector search is that it will always return some hits, although the similarity score may be low. So, it is very challenging to figure out the user intent solely from the user query.
We are trying to do a two step process - determine the use intent by making a call to the LLM and depending on the response, we are proceeding with the next step.
Thanks.
Well, I met the same issue, the simplest way is to ignore the chunks with low similarity scores, normally I use 0.82 as the threshold. You may do some adjustment accordingly. Refining the user’s query is a good idea, I would suggest you let LLM to simplify the question or remove unnecessary words to increase the precision of the search. I did this and the result is quite positive.
Hello everyone, our team is gearing up to embark on a project involving RAG, utilizing web-hosted documentation as custom data source. Currently in the initial phase of reviewing suggested implementation approaches, we acknowledge that there may be areas where the reference POC implementations might not meet the requirements of a reliable system for production. It appears that you are more advanced in this journey, and I would be grateful if you could share any suggestions and pointers. We are planning to incorporate Typescript, LangChain, Pinecone VectorDB, and OpenAI’s text-embedding-ada-002 and GPT models into our solution.
You may want to look into Langroid, the multi-agent LLM framework from ex-CMU and UW Madison researchers: GitHub - langroid/langroid: Harness LLMs with Multi-Agent Programming. We take a measured approach, avoid unnecessary code bloat/abstractions, clean and stable code (apps written 4 months ago still work).
We have a few companies using it in production (contact center agent productivity, resume ranking, policy compliance), they’ve found Langroid far more stable and useful than LangChain, and they especially like our RAG and multi-agent setup. We currently have integrations with Qdrant, ChromaDB and LanceDB (we preferred these initially since they are open-source and some have generous free hosted tiers).
1 Like
Do you have the prompts / flow logic you could share or point to a resource we could look at?
Thanks again for your ideas.
Yes, before conducting a similarity search, you could let LLM to make an analysis of the request and remove unnecessary words/messages to reduce the noise. My prompt looks as below:
You’ll analyze user requests, understand their requirements, and recreate the query based on their functionality, specialties, and capabilities to make it more concise on user’s actual needs. E.g. if the query is "The best route of traveling in Italy“, you chould reshape the query as “traveling in Italy”
From my experience, more words or ambiguous information in the query shall introduce the noise and lose the focus on the similarity search. Hope you could get a better outcome after you improve the query like this way.
1 Like
Thanks for this, will definitely try this out.
1 Like
I encountered the same issue as well. Here is my solution.
In my application there are a couple of preliminary prompts to the LLM.
-
Validation prompt to validate user’s query to prevent misusing and irrelevant queries. This prompt’s instruction is something like: If the query is invalid, you must add this delimiter %%!!%%!! in your response
. then, if that delimiter appears, the process is stopped and returns something like “invalid query” to the user.
-
Vector search prompt where there is a question whether a new search is needed, and if so, return the new query for the search. If a new search is required, it will use the new query for the search (but the original query will be used on the final LLM prompt).
2 Likes
The one bit of advice I can share after having done this for several months now is – there is no such thing as “one size fits all”.
Solutions vary from document set to document set, and sometimes from document to document. I use the LLM to create vector store similarity search “concepts” as well as prompt standalone questions:
But, I also give my users the ability to modify these approaches and speak directly to the LLM if necessary:
Expect the unexpected, and always be prepared to make the changes necessary to get the best responses possible.
2 Likes
Thanks you for this detailed response! I will definitely try it out.