Hi,
I found a post here about comparing cosine similarities over time.
I am doing something similar but a bit different. I have created an information retrieval engine that operates over document embeddings in different languages, reusing the same categories to index those embeddings in a projected feature space of cosine similarities.
There’s a faceted browser which allows users to combine features in a context sensitive way and browse the pooled cosine similarities.
My question is: does the community think this could have any use for aligning document embeddings in different languages?
i.e., if I reused the same features and indexed a large number of vector databases with anchor vectors that represented those features, and pooled them continuously until I had one very large cosine-similarity feature space, would it be useful for combining documents, or would it just contain meaningless noise and strange artefacts? Could I use it to align different embedding spaces over time? Would it be useful for training a model or for RAG?
Or is this approach just extremely naive?