Top chunks for larges context

have a chunk size of 1000 . the context is large.
so, in my use case, due to cosine, if a question-relevant answer is present in the 15th chunk.

  1. how to make sure the 15th chunk is placed in the top 5?
  2. only relevant chunks are placed at top .

As a general rule, you should make chunks as small as possible to allow for more relevant information to be in the top K results. Often sentence level chunking is useful along with a higher number of top results to ensure full coverage.

It is always a balancing act in terms of speed and completeness of results with surrounding context. As for your specific query you may wish to look into “re-ranking”

thanks @Foxalabs,

but here he uses cohere - re-rank.
what are the best ways to rank? I used keywords, but if a question is semantically correct, I want to answer it correctly.

Again, I would experiment with chunk sizes, also might be worth trying other embedding models the new model from OpenAI or one of the open source models, you could also add metadata in the form of additional keywords to each chunk to try and improve relevancy.

Ultimately if you find that your are regularly having your most relevant chunks too far down the K ranking then you may have a fundamental issue with the use case itself, i.e. numeric data or the formatting of the text.

Example of a numeric retrieval issue:

Chunks contain

“Today I ate 10 apples”
“0-20% solution is suitable for light/general use”

You search for “What are the uses of 10% mix?”

Semantically 10% has very little to do with 0-20% even though as humans we understand that is within the ranges, and a lot to do with 10 in “10 apples” so it is likely that 10 apples will out rank the 0-20% chunk.

These kinds of issue can be complex and may require a significant investment in R&D to solve.