Yes. In natural languages, it’s also how multilingual speakers think when they are asked what one word means in another language. They basically subdivide both languages into Voronoi tessellations, and then determine where there is a correspondence between common regions. What I am interested in is much more fundamental, and I am not sure that it would be of interest since it is so abstract. I believe that “Language” is “a-priori” like the laws of physics (and part of the laws of physics). I am interested in the nature of objects that Shannon et all error correct on or for. NL is an instantiation of this. If I am being too crazy, I fully understand backing out of this discussion. Else, it’s enormous fun.
As a technical note, since the embeddings are all normalized, wouldn’t convexity of the paeudo-space they inhabit be implicit?
Anyway, here’s some resources you might appreciate which are related to this discussion. Some you’ve probably already read, but there might be some new ones to you in this list.
- [1810.04882] Towards Understanding Linear Word Analogies
- [1901.09813] Analogies Explained: Towards Understanding Word Embeddings
- [2011.05864] On the Sentence Embeddings from Pre-trained Language Models
- [2306.08221] Contrastive Loss is All You Need to Recover Analogies as Parallel Lines
- [2310.17611] Uncovering Meanings of Embeddings via Partial Orthogonality
- [2403.03867] On the Origins of Linear Representations in Large Language Models
- [2406.01506] The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Depends on how you define language, I guess. I can definitely see that some sort of universality might be at play here, and I’d definitely be curious to hear what you find.
well, no. I don’t think so.
I’ll define convexity as requiring continuity. Assuming there was a “true”/“natural” cosine similarity to exist, I’ll assume that it would have some sort of practically infinite precision. As such, to construct a polytope of N items (your “embedding reference map”) , you’d need an N-1-dimensional space to fit it into. (yes there’s a limit to the domain, but I think it’s in the 10e6+ dims)
Now some of this precision is noise. So you can reduce the dimensionality of your representation space until you extinguish your noise floor. Until then, the space should be universally convex for the domain. But if you go further than that, you start clipping, and introducing discontinuities.
My guess is that somewhere at the edge, near underrepresented concepts, the arithmetic might start breaking down due to this clipping, indicating that you’re no longer in convex space. You’ll see this with these Matryoshka embeddings if you go way too low. I could of course be wrong, but I suspect that we’re not out of the water with these 10e3/10e4 dims.
thnx for the papers, I haven’t seen the last one yet.
Thank you, Diet, elmstedt. Just on a sidenote, regarding tools, could we put together a group and construct tools so that this is not just a bulletin board, but uses AI in the process?
For example, the suggestion by
elmstedt regarding papers, especially the last one is very interesting. Maybe this is already a tool, but if it’s not, then build it, and make it more powerful by integrating it using LLM‘s for example, to make what we are all discussing in these threads, more productive?