How can I improve QnA with the bot using our knowledge base with Langchain?

Stack used -

  • Using Conversational Retrieval QA | 🦜️🔗 Langchain
  • The knowledge base are bunch of pdfs → Embeddings are generated via openai ada → saved in Pinecone.
  • When a user query comes, it goes with ConversationalRetrievalQAChain with chat history
  • LLM used in langchain is openai turbo 3.5

Here are some examples of bad questions and answers -

Q: “Hi” or “Hi “who are you
A: Tells about itself using “system” instruction provided the prompt. But it also returns sources of the top chunks as sources returned by embeddings search as per the langchain. gpt-turbo LLM correctly understands that sources are not relevant, but langchain doesn’t get to know about this. By then it has already returned sources from embedding search.

Q: “Good morning”.
A: “I am not sure about this. Ask me about some-random-topic-from-knowledgebase-it-picks“.

Q: “Ok, thanks” as reply to some answer bot has given
A: Same reply as good morning because it doesn’t match from topics in knowledebase.

Q: “What are you” as first message
A: Replies correctly from “system” message already provided in prompt.

Q: “What are you” or “Who are you” as reply to some answer bot to a question like “Tell about Solar system”,
A: As it remembers chat history (client sends with each query), so, bot tells back about solar system. The expected reply should have been from the “system” instruction in the prompt.

The same type of questions are asked in openai Chatgpt turbo which doesn’t have our knowledge base, then it answers all correctly as per the query intention.

Code -

  const chain = ConversationalRetrievalQAChain.fromLLM(
      qaTemplate: QA_PROMPT,
      questionGeneratorTemplate: CONDENSE_PROMPT,
      returnSourceDocuments: true, //The number of source documents returned is 4 by default

Prompts -

const CONDENSE_PROMPT = `Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.

Chat History:
Follow Up Input: {question}
Standalone question:`;

const QA_PROMPT = `You are a helpful teacher, your name is Dolphin. You are an AI assistant providing helpful answers based on the context to provide conversational answer without any prior knowledge. You are given the following extracted parts of a long document and a question.
You should only provide hyperlinks that reference the context below. Do NOT make up hyperlinks.
If you can't find the answer in the context below, just say "Hmm, I'm not sure". You can also ask the student to rephrase the question if you need more context. But don't try to make up an answer.
If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.
Answer in a concise or elaborate format as per the intent of the question. Use formating ** to bold, __ to italic & ~~ to cut wherever required. Format the answer using headings, paragraphs or points wherever applicable. 
Question: {question}
1 Like

Hi ,
I’m having the same problem, I’ve tried different strategies and I keep getting incorrect answers, using a code very similar to yours. Did you get any explanation for this problem?