- I used Llama-Index for my RAG task and found that I can chunk my text using sentences, paragraphs, and nodes. However, I noticed that chunking sentences doesn’t save the meaning for the retrieval process, and chunking paragraphs might result in very large chunks of text. I am planning to try chunking sentences with overlapping, but I am not sure if this is the best approach. Is there a smarter way to chunk my PDF based on the meaning of the text?
This is the methodology I’ve used with some success: https://youtu.be/w_veb816Asg?si=bVUs297eLSkNXY6X
Hey can provide the link for the code, where I can refer to the method. Thank you!
We had this very same conversation and come up with a more efficient and effective solution here: Using gpt-4 API to Semantically Chunk Documents - #95 by SomebodySysop
You could also use contextual retrieval, like Antropic proposes here:
https : //www. anthropic . com /news/contextual-retrieval
Have you tried Vision model like ColPali for retrieval