Generate summary of large content based on user's inquiry, user's lead data and recent conversation

I am building a chatbot which captures user’s lead data first (static question-answering at code end) and once the user is done with that part then shifting to chat with document. I am using pinecone for vector database from where the relevant content for the user’s question is being fetched. The complete set of steps are as follows -

  1. Get relevant content from pinecone based on user’s inquiry.
  2. Summarize the content fetched in step 1 above to make it brief, picking only the most relevant parts and reducing the size to fit in the model’s token limit.
  3. Call GPT one final time to generate the final response for user’s inquiry based on - summarized content, user’s inquiry, last 6 messages from the conversation.

The issues I am facing are -

  1. 30-40% of times, the summary generated by GPT is irrelevant or is made up. This results in irrelevant or no data found type response in the last step by GPT.
  2. The prompt length in the final step sometimes crosses the model’s max token limit.
  3. GPT sometimes do not generate relevant response even if matching context is provided in the summarized content. From what I have debugged, it is hallucinating between large set of instructions of summary, conversation history, user’s lead data (for example for providing fees of only Nursing program if selected in lead data), additional instructions (like handling neutral keywords like thanks, okay, yes, no).

Below is my summarization prompt -

Summarize the content so that it can be passed to GPT in the token limit to answer the user's inquiry. Pick only the relevant content based on user's inquiry and user's data as provided by the user. For example, if the user mentioned a class in lead data and is now asking for age criteria then the age requirement for that particular class should be picked and not the age requirements of all the classes. The meaning of the content should stay the same after summarization. Relevant facts, statistics, code blocks should be included in the summary without any modifications where the code blocks should be surrounded by backticks. You do not need to provide the answer to the user's question but only the summary of the relevant content. The final answer should not be more than 2000 tokens as the summarized content will be passed to GPT along with user's inquiry, previous conversation and some set of instructions.

USER'S INQUIRY: ${inquiry}

LEAD DATA BY USER:

${formattedLeadData}

CONTENT: ${document}

Final answer:

Below is my final prompt steps -

[
					{
						role: "system",
						content: "Go through the below CONTEXT and analyse information and keywords completely.\n\nCONTEXT:\n\n${context}"
					},
					{
						role: "system",
						content: "Go through the below LEAD DATA PROVIDED BY USER and keep track of the keywords and their information and values. These will be helpful in further steps for narrowing down the final answer.\n\nLEAD DATA PROVIDED BY USER:\n\n${formattedLeadData}"
					},
					{
						role: "system",
						content: "Go through the recent messages from the conversation and analyse it.\n\nRECENT MESSAGES:\n\n${recentMessages}"
					},
					{
						role: "system",
						content: "Act like an admissions support agent and help in answering question from given CONTEXT only. Do not behave like a 3rd party."
					},
					{
						role: "system",
						content: "Obey the following Guidelines:\n- Donot make up any answer outside the context.\n- Provide only the relevant information based on IMPORTANT LEAD DATA BY USER whenever possible. For example, if the user mentioned a particular class in IMPORTANT LEAD DATA BY USER and is now asking for age eligibility then the age eligibility for that particular class should be provided instead of age eligibility of all classes.\n- Follow language and tone to be closest to human like conversation.\n- Donot include any website reference if the information is not available in the context.\n- Include hyperlinks if needed.\n- If there is a message showing gratitude or greetings or compliments or slang then respond as a polite human like support professional message. \n- Donot mention any details of source of information like page number and context in the responses.\n- Keep the conversation as real as possible by analysing recent conversation."
					},
					{
						role: "user",
						content: "Question: ${inquiry}"
					}
]

Hi Gaurav, your solution requires a prompt/RAG pipeline with a logical structure to provide observability, so you can tune each prompt/pipeline stage to ensure accuracy/relevance, and reduce errors to near 0%.

It is doable and I will be happy to help you with high-level strategy FOC if you DM the following:

All user system inputs.
The specific info you embed to store in Pinecone.
Your RAG query formation process/prompt.
Current prompt flow.