How to prevent ChatGPT-4 from answering questions that are outside our Context

I’m developing an assistant with ChatGPT-4 and I want it to respond only using the context I provide, which in my case is a single PDF file. I am using a Retrieval-Augmented Generation (RAG) methodology. Users interact with the virtual assistant through API Management and the Agent UI. The question typed by the user on the UI is sent to the backend service via a REST API registered in API Management. The backend service receives the user’s question and performs the following actions:

  1. Identifies the most appropriate sections (chunks) of the documents through a search method defined as “hybrid search”;
  2. Enriches the user’s question with contextual information and the sections of the documents identified in step 1;
  3. Sends the enriched question to the LLM (GPT-4) provided through OpenAI;
  4. Returns the model-generated response in streaming mode.

Once the relevant chunks are selected (in step 1), they are prepared to be provided to the LLM model along with the user’s question. Steps:

  • Preprompt Preparation: A preprompt is created to contextualize the chunks and the user’s question. This may include additional information such as the context of the conversation or details relevant to the query. In our case, the context of the LLM model usage and the limitations to ensure it only answers questions related to my CONTEXT are provided;
  • Chunk Integration: The retrieved chunks are included in the prompt that will be provided to the LLM. This helps the model better understand the context and generate a more accurate response.

Finally, an example containing a question and answer representing an ideal response prototype is provided to the model. In conclusion, the query and prompt are merged into a single final prompt and provided to the model to generate the completion, i.e., the answer to the user’s question.

To prevent the model from using information outside of the CONTEXT, I created an automatic check on the chunks, considering I am also using the semantic ranker service. If no chunks are returned from the search or if they do not have at least a semantic ranker value of 2, an automatic response is provided.
I’ve set the temperature to 0. Are there other parameters that can help me stay focused only on my CONTEXT?

I have some doubts about the preprompt: I gave a series of commands (such as: “Respond only and exclusively using the information contained in the provided chunks. If the chunks do not contain a relevant answer, respond with ‘I am unable to answer this question with the available information.’” ) but I wanted to know if, in your experience, it is better to give this series of commands as a bulleted or numbered list, for example:

  • Command 1
  • Command 2
  • Command 3

Or

  1. Command 1
  2. Command 2
  3. Command 3

I would lean towards the numbered list, do you have any experience in this regard?

I also inserted the phrase “Do not explain your answer” before the user’s question to make the assistant’s response more concise, thus avoiding giving additional information (perhaps external to the CONTEXT) and “forcing” the user to ask further questions with more information.

Unfortunately, I am still encountering issues with some questions where the assistant uses information outside the provided CONTEXT.

Do you have any suggestions for solving this problem?

Thanks for your help.

1 Like

Welcome to the Forum!

First of all thanks for the detailed overview, which is helpful. It sounds like you are already implementing a lot of “best practices”.

Speaking from my own experience, when it is critical to focus the model only on the information provided I tend to rely on a dual strategy in my prompt. That involves being specific in my prompt about the sources that are provided to the model for generating a response and then essentially reinforcing that by including in my instructions phrases like “Strictly only rely on the sources provided in generating your response. Never rely on external sources”. In general I find that works pretty well.

Sometimes of course there are still edge cases. For example, if there is some but very little information relevant to the question contained in the retrieved chunks, the models have the tendency to expand the information which can lead to hallucinations and/or using their own knowledge. To deal with this, I frequently include specific additional instructions for those edge cases. That would include a short description of the case and then an instruction on how to respond in these cases. You could even go as far as to include an ideal response example catered to these edge cases.

Thank @jr.2509 for your response.
As you rightly mention, there are cases where some words from the question are present in the selected chunks, but there isn’t enough information in them to formulate a response.
These cases are difficult to identify with rules. I tried using the semantic ranker with a threshold value of 2 (ranging from 0 to 4), but these cases still occur. Additionally, if this value is set too high, there’s a risk of overly limiting the number of questions the assistant can answer.

I also encounter situations where words (for example Retargeting) are mentioned in both the question and the chunks, but there is no information in the CONTEXT regarding their definition. If the assistant is asked to provide a definition of Retargeting, it does so because it is present in its knowledge base, but it shouldn’t provide it since it’s not present in the CONTEXT, and this is not restricted by the chunk selection because there are chunks that contain that word!

I haven’t yet figured out how to avoid these cases. Do you have any suggestions?

Of course, I’ve used phrases in the preprompt that say “stick strictly to the CONTEXT.”
Thanks for your help.

Hi @lopry81

You may try following prompt in your pre prompt:

You are an information retrieval assistant. Your primary role is to answer user questions strictly and exclusively using the information provided in the given context chunks. Your instructions are as follows:

Respond only and exclusively using the information contained in the provided chunks. Do not introduce any information that is not present in the chunks.

If the provided chunks do not contain sufficient information to answer the question, or if the chunks do not directly address the user’s query, respond with: "I am unable to answer this question with the available information."

Do not explain your answer or provide any additional commentary. Your responses should be concise and focused on addressing the user's query using only the provided information.

Adhere to the context and limitations at all times. If any part of the question cannot be answered with the provided chunks, you must refrain from speculation or the use of external knowledge.

If there are multiple chunks provided, integrate the information cohesively, but do not infer or create connections beyond what is explicitly stated in the chunks.

If no chunks are provided or if they are insufficient, immediately default to the response outlined in instruction 2.

Final Reminder: Your responses must be anchored solely in the content of the provided chunks. Any deviation from this rule should result in the default response.