Is it possible to get "context aware" embeddings?

I have been able to solve the issue of “context aware” embeddings this way: Using gpt-4 API to Semantically Chunk Documents - #172 by SomebodySysop

  1. Each embedding chunk has a metadata property which uniquely identifies it and it’s position in the chunked document.
  2. When the chunk is retrieved by cosine similarity search, I programmatically use the chunk identifier to locate adjacent chunks.
  3. I send the original key chunk with adjacent chunks to the model along with question and chat history to render a response.

So far, this is working very well. My chunks can be as small as one sentence or as large as multiple paragraphs – I can adjust how many adjacent chunks are returned depending upon the type of documents processed. This would be the adjacent chunk “radius” as defined by @curt.kennedy

I actually shared the code I used to do this here: Retrieving “Adjacent” Chunks for Better Context - #10 by SomebodySysop - Support - Weaviate Community Forum

1 Like