When I use RAG, I want everything to be connected. Should I use Llama Index or Langchain?
And, I faced a problem because my data was not in the same format and I wanted to automatically chunk text from PDFs, Docx, and TXT files and store them. What is the best way to handle this case?
Also, if someone has any keywords that would help me in RAG, please let me know.
use simple directory reader from LLAMA index and keep all your files stored inside a folder, and load the documents by passing the folder path, and build your index over it.
certain documents might ask you to install additional dependencies like PyPDF, docx2txt, pptx-python depending on the type of files stored in the folder.
I know about it, however, it seems there is no efficient way to split a PDF smartly using the Llama index without overlapping.