I’m currently working on a data chunking process where I split a large dataset into multiple smaller chunks. To avoid losing context, I add some overlap between consecutive chunks. However, I’ve been facing an issue: despite the overlap, the retrieval results sometimes miss important pieces of information, and the accuracy seems inconsistent.
It appears that the overlap is not always preserving enough context, or perhaps there is something else I’m missing in the chunking or retrieval approach.
Has anyone else encountered similar issues with overlapping chunks during chunking? What strategies can improve retrieval accuracy in this context? Any advice on optimal overlap size or potential pitfalls I should be aware of?