Has anyone successfully managed to implement some sort of continuous temporal or linear coding with embeddings?
What am I talking about?
Say you want to encode the relative position of something in a book, or in time.
More specifically, for instance, if you had a number line
[..., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,...]
and you did a query on it, you’d want “11.45” for example, to be closest to 11 and 12, and then a little further you’d have 10 and 13, further yet 9 and 14, etc.
However, if you’ve worked with embeddings before, you’ll know that 11.45 is probably going to be closest to 11, 45, 4, 5, 1, and the remaining numbers are splattered all over the place. With te3l for example, 11 is closer to 111 than it is to 12 or 10.
So naive approaches are probably all out.
One glaring issue is that this sort of positional coding seems to require a space with negative curvature, but we’re working on the surface of hyperspheres - positively curved.
One approach that comes to mind could be to select some non-geodesic small circle (r<1) on our hypersphere and rotate any given queryable vector along that circle with a theta specified by our linear value.
small circle picture
The query, however, wouldn’t be transformed the same way - we’d either use the parallel greatcircle (r=1) or some other small circle that is larger than our queryable circle. If we did this, we’d approximate a locally hyperbolic space.
Now the problem here is that you can maximally use half the sphere, so you’d need to re-compute every queryable vector’s position relative to the query vector’s theta.
Which means that you might as well just multiply the cosine similarity by |cosh(delta_theta)|cosh(-1 * delta_theta^2)cos(delta_theta) (more or less). -cosh(delta_theta).
but you might recognize this function: we’ve just reinvented positional encoding from our favourite paper, attention is all you need (https://arxiv.org/pdf/1706.03762)
![]()
My question: has anybody tried this or something similar, with any empirical experience?













