Fine-tuning or update embedding of a String

khushpatel2002 · August 12, 2023, 10:54am

Hello Everyone,

I have a quick question, I have a sentence and I generated an embedding vector and then I want to add new info in the vector, I will need to regenerate, but is there any other way I get the updated embeddings with the new info? I thought of adding the vectors element-wise, But is there any best way I can achieve this?

I really appreciate any help you can provide.

elmstedt · August 13, 2023, 3:55am

Adding the vectors element-wise won’t work, they’re normalized to unit-length.

I’ve not given too much thought about vector-arithmetic in embedded vector-spaces, but my gut instinct is you’ll want to normalize the multiplication of the vectors.

Basically, you need to ensure that if you have three vectors x, y, and z that whatever your operation is, let’s denote it as ★, that ★ is associative and commutative.

khushpatel2002 · August 13, 2023, 10:39am

Yes, Thankyou so much for your reply, I will have a look at the multiplications of the vectors and I will do some experimentation. I was exploring the LORA (low-rank adaptation) and PEFT (parameter efficient fine-tuning) papers, which basically update the weights of the model i.e providing new knowledge to the model, so I thought of something which can be used to update the embeddings also… so…, Thanks a lot!!

curt.kennedy · August 13, 2023, 6:35pm

Since the OpenAI embedding vectors are all unit vectors living on the surface of a 1536 dimensional hyper-sphere, the only operation you could do that makes mathematical sense is vector rotation.

So if you embed a sentence and get vector v0, then add or subtract a few words and get vector v1, you can solve for some rotation matrix given v1 and v0. But I’d say it’s dangerous to assume all alterations of the same type will result in the same rotation matrix. My guess they are all random rotation matrices.

Because of this, I would just re-embed the altered data to get a new vector.

elmstedt · August 13, 2023, 8:29pm

If we define the operator ★ as the normalized element-wise product of two vectors, it should hold mathematically. Whether or not it maintains the semantic meaning of the combined texts is another thing altogether.
During my kid’s nap I might go to the computer and give it a try.
I’m thinking something simple like,

x = "The ball is blue."
y = "The ball is big."
z = "The ball is blue. The ball is big."
w = "The ball is big and blue."

Then seeing what the encoding yields.

An initial test of the mathematical properties of this operation is promising.

import random
import math

def normalized_random_vector(n):
    random_vector = [random.gauss(0, 1) for _ in range(n)]
    magnitude = math.sqrt(sum([x**2 for x in random_vector]))
    normalized_vector = [x / magnitude for x in random_vector]
    return normalized_vector

import numpy as np
from decimal import Decimal, getcontext
getcontext().prec = 28

def combine2vectors(x, y):
    x = [Decimal(v) for v in x]
    y = [Decimal(v) for v in y]
    z = [x[i] * y[i] for i in range(len(x))]
    mag = np.linalg.norm([float(v) for v in z])
    return [float(v) / mag if mag != 0 else float(v) for v in z]

x = normalized_random_vector(1536)
y = normalized_random_vector(1536)
z = normalized_random_vector(1536)

lhs = combine2vectors(x, combine2vectors(y, z))
rhs = combine2vectors(combine2vectors(x, y), z)
dot_product = sum([lhs * rhs for lhs, rhs in zip(lhs, rhs)])

# Calculate cosine distance
cosine_distance = 1 - dot_product
cosine_distance

Result
-6.661338147750939e-16

So, within the confines of numerical precision, this operator ★ has the associative property—which is a good sign since we want the combination of three vectors to be the same regardless of the order in which they’re combined.

It also opens the way for “subtractive” operations by way of element-wise division.

So, we might expect,

q = "The ball is blue and not big."

to be (close to) the normalized Hadamard division of x by y.

Code by GPT-4

curt.kennedy · August 13, 2023, 9:04pm

While the associative property is good, the thing I am worried about with the multiplicative approach is that the multiplications could easily push the vector into uncharted territory.

For some context, consider the fact that ada-002 vectors all lie within a ~54 degree wide hyper-cone. So, let’s say I use your star operator and and multiply it with a normalized all negative ones vector [-1, -1, -1, …]/sqrt(1536), I get a vector that is 180 degrees away now, and is outside the 54 degree wide cone.

So now, after their “normalized Hadamard multiplication”, the operator fails to preserve closure. And without closure, I’m afraid there is no hope.

This same argument can be used to invalidate a fixed rotation as the operator, since somewhere at the edge of the cone, there will exist points that are rotated outside of the range of the embedding function cone.

So because of this lack of closure in any of these operators, I’m suspicious the meaning will be preserved after the star operator.

One way to push new meaning into these vectors is to use them as an input to another neural network. Here the cone is totally obliviated.

elmstedt · August 14, 2023, 12:57am

Absolutely, the operator isn’t guaranteed to be an automorphism (and almost certainly isn’t).

But it still *might be useful.

I didn’t end up getting a chance to play with it (wound up at the beach today), but I’ll take a look when I can.

Do you have any literature on the range of ada-002 embeddings?

Do you know if nonsense, random-token strings are also in that range?

curt.kennedy · August 14, 2023, 1:29am

The Hadamard product is totally useful and good. It is the precursor to cosine similarity: sum(Hadamard(u,v)), where the sum is taken of the coordinates of the Hadamard product vector.

If I were to pinpoint what is throwing me off, is why you are normalizing it back out to unit length? This implies you are treating this product as a “new embedding vector” and this I am not sure is a good idea, and why the “out-of-range/closure-failure” red flags started lighting up in my head.

I think you will get more mileage out of the raw Hadamard product, without normalization, and treat it as an “interaction vector”, where it’s potential non-unit length is an important aspect of the interaction of the two vectors. So this is more of a raw correlation vector, before the sum. The angle of rotation is part of it, but more information is in the length IMO.

As for the Ada-002 range, this is from my own experience on embedding 100’s of thousands of random strings, not exactly nonsensical strings, and seeing their overall angular spread. It is also evident by others observations on this forum about the cosine similarities being very close, even for different things, and only going down to 0.7 at the lowest, where the theoretical lowest is -1. So it only has a variation of 1 to 0.7 (spread of 0.3), out of a total theoretical variation of 1 to -1 (spread of 2). So it only varies 15% over its possible range.

elmstedt · August 14, 2023, 10:15pm

I don’t know… I just found two texts with a cosine similarity of 0.45 without even looking very hard.

v = [" onBindViewHolder",  " segments_remain doseima meaning"]

Which corresponds to an angle of ~63 degrees.

There are on the order of 1.25E40969 possible input texts, I don’t think we can be so quick to dismiss the idea that the range of possible vector outputs is so constrained.

Anyway…

I did finally get a chance to experiment a bit, and it seems I was just plain wrong about the product anyway.

Doing the straight, normalized, element-wise product was absolutely no good. But, a modified element-wise geometric mean works pretty well. Just not as well as the normalized sum.

                                   The ball is blue. The ball is big. The ball is blue and big. The ball is big and blue. The ball is blue. The ball is big.   sum   geo
The ball is blue.                              1.000            0.914                     0.977                     0.970                              0.957 0.978 0.975
The ball is big.                               0.914            1.000                     0.960                     0.965                              0.953 0.978 0.972
The ball is blue and big.                      0.977            0.960                     1.000                     0.996                              0.982 0.990 0.985
The ball is big and blue.                      0.970            0.965                     0.996                     1.000                              0.981 0.989 0.984
The ball is blue. The ball is big.             0.957            0.953                     0.982                     0.981                              1.000 0.976 0.971
sum                                            0.978            0.978                     0.990                     0.989                              0.976 1.000 0.995
geo                                            0.975            0.972                     0.985                     0.984                              0.971 0.995 1.000

Topic		Replies	Views
Embedding Results Scale Seems Off API embeddings , ada	8	3325	December 24, 2023
Expected Angular Differences in Embedding Random Text? API	9	852	December 24, 2023
Some questions about text-embedding-ada-002’s embedding API	146	33077	December 13, 2023
`text-embedding-ada-002` API	23	13612	February 6, 2024
Can I add embeddings together? API	3	1063	August 22, 2022

Fine-tuning or update embedding of a String

Related Topics