Optimal way to chunk word document for RAG(semantic chunking giving bad results)


Context Length Aware Ranked Elided Document Injection

In this context-aware chunking scheme I propose for semantic database knowledge retrieval, we are re-assembling a document into a small version just focused on the relevant information.

  1. Splitting: Document is chunked based on section identification, and then sub-chunked on token count if further division is necessary for piecing into an AI context length.

  2. Enrichment: A summary and section navigation metadata is included in each chunk text. This gives similarity results that prefer common documentation sources when working with a rich variety of possible knowledge. Additional piecing index information is added as out-of-band metadata.

  3. Embedding for semantics: this already provides quality, but the embeddings can be further infused with example AI synthetic questions that target the content. Because AI language is so much more expensive than embedding, this is certainly optional, but is basically HyDE that is prepared instead of delaying on-demand search by improving user inputs.

  4. Top-k token threshold search: we have an exhaustive search on the contextual information, and we also are given a size target and a cutoff. Without relevance, you can get nothing returned, and you get max tokens you set when much documentation is relevant.

  5. Document reconstruction: This is the key component: we rewrite the chunks back into a summarized headline document, where the AI is given the appearance of an article that clearly has sections removed, elided, that is built out of the retrieval chunks, with indexing included. If there is token budget, we can also weight the surrounding chunks with a boost to see if they should be included.

  6. Injection: the AI gets its “Document snippets relevant to the most recent input”. This all happened with just one more AI embedding of user input and its lead-up.

  7. Read more? If the AI is on the right track, and just needs to read into prior or following chunks itself, a function can allow it to place more into the elided document that is assembled - we don’t give a tool return, we give better document.

This structured RAG method meets the goal of retaining heading & context, providing high quality documentation suitable for AI comprehension.