Fine-tuning or update embedding of a String

Hello Everyone,

I have a quick question, I have a sentence and I generated an embedding vector and then I want to add new info in the vector, I will need to regenerate, but is there any other way I get the updated embeddings with the new info? I thought of adding the vectors element-wise, But is there any best way I can achieve this?

I really appreciate any help you can provide.

Adding the vectors element-wise won’t work, they’re normalized to unit-length.

I’ve not given too much thought about vector-arithmetic in embedded vector-spaces, but my gut instinct is you’ll want to normalize the multiplication of the vectors.

Basically, you need to ensure that if you have three vectors x, y, and z that whatever your operation is, let’s denote it as ★, that ★ is associative and commutative.

1 Like

Yes, Thankyou so much for your reply, I will have a look at the multiplications of the vectors and I will do some experimentation. I was exploring the LORA (low-rank adaptation) and PEFT (parameter efficient fine-tuning) papers, which basically update the weights of the model i.e providing new knowledge to the model, so I thought of something which can be used to update the embeddings also… so…, Thanks a lot!!

Since the OpenAI embedding vectors are all unit vectors living on the surface of a 1536 dimensional hyper-sphere, the only operation you could do that makes mathematical sense is vector rotation.

So if you embed a sentence and get vector v0, then add or subtract a few words and get vector v1, you can solve for some rotation matrix given v1 and v0. But I’d say it’s dangerous to assume all alterations of the same type will result in the same rotation matrix. My guess they are all random rotation matrices.

Because of this, I would just re-embed the altered data to get a new vector.

2 Likes

If we define the operator ★ as the normalized element-wise product of two vectors, it should hold mathematically. Whether or not it maintains the semantic meaning of the combined texts is another thing altogether.
During my kid’s nap I might go to the computer and give it a try.
I’m thinking something simple like,

x = "The ball is blue."
y = "The ball is big."
z = "The ball is blue. The ball is big."
w = "The ball is big and blue."

Then seeing what the encoding yields.

An initial test of the mathematical properties of this operation is promising.

import random
import math

def normalized_random_vector(n):
    random_vector = [random.gauss(0, 1) for _ in range(n)]
    magnitude = math.sqrt(sum([x**2 for x in random_vector]))
    normalized_vector = [x / magnitude for x in random_vector]
    return normalized_vector
import numpy as np
from decimal import Decimal, getcontext
getcontext().prec = 28

def combine2vectors(x, y):
    x = [Decimal(v) for v in x]
    y = [Decimal(v) for v in y]
    z = [x[i] * y[i] for i in range(len(x))]
    mag = np.linalg.norm([float(v) for v in z])
    return [float(v) / mag if mag != 0 else float(v) for v in z]

x = normalized_random_vector(1536)
y = normalized_random_vector(1536)
z = normalized_random_vector(1536)

lhs = combine2vectors(x, combine2vectors(y, z))
rhs = combine2vectors(combine2vectors(x, y), z)
dot_product = sum([lhs * rhs for lhs, rhs in zip(lhs, rhs)])

# Calculate cosine distance
cosine_distance = 1 - dot_product
cosine_distance

Result
-6.661338147750939e-16

So, within the confines of numerical precision, this operator ★ has the associative property—which is a good sign since we want the combination of three vectors to be the same regardless of the order in which they’re combined.

It also opens the way for “subtractive” operations by way of element-wise division.

So, we might expect,

q = "The ball is blue and not big."

to be (close to) the normalized Hadamard division of x by y.

Code by GPT-4

1 Like

While the associative property is good, the thing I am worried about with the multiplicative approach is that the multiplications could easily push the vector into uncharted territory.

For some context, consider the fact that ada-002 vectors all lie within a ~54 degree wide hyper-cone. So, let’s say I use your star operator and and multiply it with a normalized all negative ones vector [-1, -1, -1, …]/sqrt(1536), I get a vector that is 180 degrees away now, and is outside the 54 degree wide cone.

So now, after their “normalized Hadamard multiplication”, the operator fails to preserve closure. And without closure, I’m afraid there is no hope.

This same argument can be used to invalidate a fixed rotation as the operator, since somewhere at the edge of the cone, there will exist points that are rotated outside of the range of the embedding function cone.

So because of this lack of closure in any of these operators, I’m suspicious the meaning will be preserved after the star operator.

One way to push new meaning into these vectors is to use them as an input to another neural network. Here the cone is totally obliviated.

1 Like

Absolutely, the operator isn’t guaranteed to be an automorphism (and almost certainly isn’t).

But it still *might be useful.

I didn’t end up getting a chance to play with it (wound up at the beach today), but I’ll take a look when I can.

Do you have any literature on the range of ada-002 embeddings?

Do you know if nonsense, random-token strings are also in that range?

2 Likes

The Hadamard product is totally useful and good. It is the precursor to cosine similarity: sum(Hadamard(u,v)), where the sum is taken of the coordinates of the Hadamard product vector.

If I were to pinpoint what is throwing me off, is why you are normalizing it back out to unit length? This implies you are treating this product as a “new embedding vector” and this I am not sure is a good idea, and why the “out-of-range/closure-failure” red flags started lighting up in my head.

I think you will get more mileage out of the raw Hadamard product, without normalization, and treat it as an “interaction vector”, where it’s potential non-unit length is an important aspect of the interaction of the two vectors. So this is more of a raw correlation vector, before the sum. The angle of rotation is part of it, but more information is in the length IMO.

As for the Ada-002 range, this is from my own experience on embedding 100’s of thousands of random strings, not exactly nonsensical strings, and seeing their overall angular spread. It is also evident by others observations on this forum about the cosine similarities being very close, even for different things, and only going down to 0.7 at the lowest, where the theoretical lowest is -1. So it only has a variation of 1 to 0.7 (spread of 0.3), out of a total theoretical variation of 1 to -1 (spread of 2). So it only varies 15% over its possible range.

1 Like

I don’t know… I just found two texts with a cosine similarity of 0.45 without even looking very hard.

v = [" onBindViewHolder",  " segments_remain doseima meaning"]

Which corresponds to an angle of ~63 degrees.

There are on the order of 1.25E40969 possible input texts, I don’t think we can be so quick to dismiss the idea that the range of possible vector outputs is so constrained.

Anyway…

I did finally get a chance to experiment a bit, and it seems I was just plain wrong about the product anyway.

Doing the straight, normalized, element-wise product was absolutely no good. But, a modified element-wise geometric mean works pretty well. Just not as well as the normalized sum.

                                   The ball is blue. The ball is big. The ball is blue and big. The ball is big and blue. The ball is blue. The ball is big.   sum   geo
The ball is blue.                              1.000            0.914                     0.977                     0.970                              0.957 0.978 0.975
The ball is big.                               0.914            1.000                     0.960                     0.965                              0.953 0.978 0.972
The ball is blue and big.                      0.977            0.960                     1.000                     0.996                              0.982 0.990 0.985
The ball is big and blue.                      0.970            0.965                     0.996                     1.000                              0.981 0.989 0.984
The ball is blue. The ball is big.             0.957            0.953                     0.982                     0.981                              1.000 0.976 0.971
sum                                            0.978            0.978                     0.990                     0.989                              0.976 1.000 0.995
geo                                            0.975            0.972                     0.985                     0.984                              0.971 0.995 1.000
1 Like