I use Weaviate also, didn’t know about the clustering option – will need to look into that.
What I’ve been doing are two things:
- Small to Big Retrieval, where I programmatically retrieve x chunks before and after each chunk that is returned in the cosine similarity search: Advanced RAG 01: Small-to-Big Retrieval | by Sophia Yang, Ph.D. | Towards Data Science
- Chunk Retrieval Rating: I rate (0 - 10) each retrieved chunk as to it’s relationship to the query submitted. I remove those chunks with low ratings and only return to the model those which have the highest likelihood of responding to the query. This process is neither as time consuming nor expensive as I originally thought it would be.
These two methodologies, along with the Hierarchal/Semantic chunking process discussed here, and Weaviate using the OpenAI text-embedding-3-large embed model, are giving me the best responses I’ve ever received.