Large time to complete RetrievalQA invokation

I am very new to implementation of embeddings. I wrote a simple application which reads the text from PDF, chunk it in 500 words each and using OpenAIEmbeddings and Chroma.from_text method to create vectorstore on the disk.

Once the chroma index is created in the background on given set of documents, for every question, I create RetrievalQA chain using the vector store and then invoke the chain for a given question.

vector_store = load_vector_store(save_path)
llm = ChatOpenAI(model="gpt-4") 
qa_chain = RetrievalQA.from_chain_type(llm, retriever=vector_store.as_retriever())
answer = qa_chain.invoke(question)

Tested this with small dataset large datasets. This works pretty well functionally. But my observation is qa_chain.invoke method takes almost 1.5 mins to execute.

Can someone review and guide on what could be wrong here?

1 Like