Maintain document metadata when filtering with GPT?

raphael5 · November 16, 2023, 1:01pm

Hi everyone,

I am facing a little challenge when dealing with metadata.

Long story short, I first run a RAG that returns a bunch of documents. Each document is associated with metadata.

Then I build a prompt & combine it to all my documents (as context) before sending every to GPT. The goal of the prompt is to select the most relevant pieces of text from the different documents in order to answer a specific question. So basically I might start with 10 documents and end up with text extract from document 1, 5 and 8 for instance.

Blockquote
def gpt4_query(prompt, input_documents):

from langchain.chat_models import ChatOpenAI
from langchain.chains.question_answering import load_qa_chain
import time

time.sleep(3)  #Pause to avoid overloading open ai API

#Initiate the LLM model access
llm = ChatOpenAI(openai_api_key="xxxx", model = "gpt-4-1106-preview")

#Initiative query chain mode. 
chain = load_qa_chain(llm, chain_type="stuff",verbose=True) #"Stuff" chain type is used because we will only submit a small piece of text that is prefiletered with the semantic search
answer =  chain.run(input_documents=input_documents, question=prompt, return_only_outputs=True) #Submit the analysis to GPT4 for final check / relevancy verification / Semantic cleaning

return answer

Is there a way I could keep the metadata from the documents that were selected (1,5,8 in my example) by GPT? Langchain input_document parameter does not seem to consider the metadata part so I’m not sure what to do to keep it.

Thanks a lot for the help

Topic		Replies	Views
Challenge of Prompt Engineering Prompting chatgpt	2	1671	August 8, 2023
Keep document source / link with RAG & GPT workflow Community gpt-4 , rag	0	1292	October 25, 2023
Obtaining correct PDF page number in the response using GPTs Prompting gpt-4 , gpts	12	4487	June 27, 2025
Do the answers endpoint actually use the metadata? API	10	1976	July 23, 2023
Generalize knowledge across multiple documents Prompting gpt-4	0	350	April 5, 2024

Maintain document metadata when filtering with GPT?

Related topics