Summarize PPTX files and store in vector DB

axysharma · March 20, 2024, 2:44pm

Hi,

I have 300+ PPTX files and I want to create a chatbot which queries the PPTX file data. Since these PPTX files are large, I decided to use the following approach:

Read all PPTX files and generate summary of each PPTX file.
Store the summary of each PPTX file in vector database along with source document metadata.
Query the vector database on the basis of user query
Pass the query and returned documents to LMM to get the final output.
Return the final output and the source documents to user.

I am using UnstructuredPowerPointLoader to load the PPTX files and create a summary of each file using load_summarize_chain. The chain returns me string.

How can I store the output of load_summarize_chain in vector database (chroma).

Also please let me know if this approach is correct. Any sample code example will be really helpful.

Topic		Replies	Views
Ask questions about a pdf without storing it in vector database API chatgpt , api , rag , development , assistants-api	4	1000	July 16, 2024
Best way to save html files in vector store API langchain	4	7389	October 9, 2023
How to use chroma db as retriever API chromadb	2	4190	May 22, 2024
Chatbot with user provided files: how to let GPT have a "overall" view of the file content? API	3	1589	December 16, 2023
Interact with text files model API	1	1365	August 7, 2023

Summarize PPTX files and store in vector DB

Related topics