Combining Different Latent Spaces?

jack.kausch · September 12, 2025, 5:15pm

Hi,

I found a post here about comparing cosine similarities over time.

I am doing something similar but a bit different. I have created an information retrieval engine that operates over document embeddings in different languages, reusing the same categories to index those embeddings in a projected feature space of cosine similarities.

There’s a faceted browser which allows users to combine features in a context sensitive way and browse the pooled cosine similarities.

My question is: does the community think this could have any use for aligning document embeddings in different languages?

i.e., if I reused the same features and indexed a large number of vector databases with anchor vectors that represented those features, and pooled them continuously until I had one very large cosine-similarity feature space, would it be useful for combining documents, or would it just contain meaningless noise and strange artefacts? Could I use it to align different embedding spaces over time? Would it be useful for training a model or for RAG?

Or is this approach just extremely naive?

Macha · September 20, 2025, 10:49pm

Hey there and welcome!

Sounds like a neat project.

Just to be clear here, do you mean like, different natural languages that represent the contents of the docs, or different programming languages here?

Perhaps yes, but perhaps this might also be rather over-complicating something that could be solved with a simple graph database? You can essentially just make your own pool of vectorized data that has any kind of arbitrary connections/clusters or links between nodes that you’d like. Myself and others here really like neo4j because of the cool ways in which you can cluster and vectorize things for easy RAG with an LLM.

Topic		Replies	Views
How I cluster/segment my text after embeddings process for easy understanding? API	13	14553	December 18, 2024
Capturing Meaning other than Similarity (e.g., generalization) in vectors? API vector-store , tp-2	24	1488	June 27, 2024
Combining OpenAI Embeddings and OpenAI CLIP embeddings? API	0	1462	March 22, 2023
Relating the RAG related unstructured data with structured data Community chatgpt	7	6455	December 15, 2025
An established technique for teaching optimized vector paths? Community embeddings	1	638	May 11, 2023

Combining Different Latent Spaces?

Related topics