I’m developing an assistant with ChatGPT-4 and I want it to respond only using the context I provide, which in my case is a single PDF file. I am using a Retrieval-Augmented Generation (RAG) methodology. Users interact with the virtual assistant through API Management and the Agent UI. The question typed by the user on the UI is sent to the backend service via a REST API registered in API Management. The backend service receives the user’s question and performs the following actions:
- Identifies the most appropriate sections (chunks) of the documents through a search method defined as “hybrid search”;
- Enriches the user’s question with contextual information and the sections of the documents identified in step 1;
- Sends the enriched question to the LLM (GPT-4) provided through OpenAI;
- Returns the model-generated response in streaming mode.
Once the relevant chunks are selected (in step 1), they are prepared to be provided to the LLM model along with the user’s question. Steps:
- Preprompt Preparation: A preprompt is created to contextualize the chunks and the user’s question. This may include additional information such as the context of the conversation or details relevant to the query. In our case, the context of the LLM model usage and the limitations to ensure it only answers questions related to my CONTEXT are provided;
- Chunk Integration: The retrieved chunks are included in the prompt that will be provided to the LLM. This helps the model better understand the context and generate a more accurate response.
Finally, an example containing a question and answer representing an ideal response prototype is provided to the model. In conclusion, the query and prompt are merged into a single final prompt and provided to the model to generate the completion, i.e., the answer to the user’s question.
To prevent the model from using information outside of the CONTEXT, I created an automatic check on the chunks, considering I am also using the semantic ranker service. If no chunks are returned from the search or if they do not have at least a semantic ranker value of 2, an automatic response is provided.
I’ve set the temperature to 0. Are there other parameters that can help me stay focused only on my CONTEXT?
I have some doubts about the preprompt: I gave a series of commands (such as: “Respond only and exclusively using the information contained in the provided chunks. If the chunks do not contain a relevant answer, respond with ‘I am unable to answer this question with the available information.’” ) but I wanted to know if, in your experience, it is better to give this series of commands as a bulleted or numbered list, for example:
- Command 1
- Command 2
- Command 3
- …
Or
- Command 1
- Command 2
- Command 3
- …
I would lean towards the numbered list, do you have any experience in this regard?
I also inserted the phrase “Do not explain your answer” before the user’s question to make the assistant’s response more concise, thus avoiding giving additional information (perhaps external to the CONTEXT) and “forcing” the user to ask further questions with more information.
Unfortunately, I am still encountering issues with some questions where the assistant uses information outside the provided CONTEXT.
Do you have any suggestions for solving this problem?
Thanks for your help.