How to load all types of documents (.pdf, .txt, .docx, .csv, .excel) through document loader using Pinecone through Langchain wrapper?

I am into creating an interactive chatbot that can take inputs from multiple data sources like pdf, word file, text file, excel files etc. I am using Pinecone retriever with Langchain wrapper on top of it. When I go for DirectoryLoader using glob function, I’m unable to load other file types except PDF and convert it to vector embeddings. Need a way to load rest of the documents and process it further for embeddings.

1 Like

hi , have you recieved any answer ? or you have find any solution ?

1 Like

Hi welcome to the community, I think you will better get a response on langchain community. Also you can ask Mendable bot on there site, just press Ctrl + K to use.

image

You mean PineconeHybridSearchRetriever?

Hi , Actually we need to handle all the files differently while loading and while processing also ,because I was also stuck but then I got this solution and it worked.

@kpathak1 Can you please specify which solution worked for you?

like you need to handle every document in a different way in python and then you need to do the process on it and for handling every document in python we can get ready made code on google, but for me I have created different program for .pdf extention , and created diifrent program for .csv, .docx , .txt extensions.