I want to improve the indexing of a documents I have. I had a theory that one issue is trying to fit the whole document into one vector and I wanted to play with the granularity. so my question is:
Can I get the embedding on a partial text while retaining the context of the whole document?
Really, you can send any text you want to the embeddings endpoint and get some sort of semantically-based vector back.
There’s different techniques, but some of them can bias one document over another, or bias one amount of text over another.
You can certainly consider other ways of “databasing” your document, and many do:
- make smaller chunks and obtain embeddings for each, and each refers to the whole document,
- make smaller chunks, and average the weight of vectors to make a single reference,
- Add summary information about the whole document, then more from a section,
- (insert where you have more imagination than me…)