If we define the operator â
as the normalized element-wise product of two vectors, it should hold mathematically. Whether or not it maintains the semantic meaning of the combined texts is another thing altogether.
During my kidâs nap I might go to the computer and give it a try.
Iâm thinking something simple like,
x = "The ball is blue."
y = "The ball is big."
z = "The ball is blue. The ball is big."
w = "The ball is big and blue."
Then seeing what the encoding yields.
An initial test of the mathematical properties of this operation is promising.
import random
import math
def normalized_random_vector(n):
random_vector = [random.gauss(0, 1) for _ in range(n)]
magnitude = math.sqrt(sum([x**2 for x in random_vector]))
normalized_vector = [x / magnitude for x in random_vector]
return normalized_vector
import numpy as np
from decimal import Decimal, getcontext
getcontext().prec = 28
def combine2vectors(x, y):
x = [Decimal(v) for v in x]
y = [Decimal(v) for v in y]
z = [x[i] * y[i] for i in range(len(x))]
mag = np.linalg.norm([float(v) for v in z])
return [float(v) / mag if mag != 0 else float(v) for v in z]
x = normalized_random_vector(1536)
y = normalized_random_vector(1536)
z = normalized_random_vector(1536)
lhs = combine2vectors(x, combine2vectors(y, z))
rhs = combine2vectors(combine2vectors(x, y), z)
dot_product = sum([lhs * rhs for lhs, rhs in zip(lhs, rhs)])
# Calculate cosine distance
cosine_distance = 1 - dot_product
cosine_distance
Result
-6.661338147750939e-16
So, within the confines of numerical precision, this operator â
has the associative propertyâwhich is a good sign since we want the combination of three vectors to be the same regardless of the order in which theyâre combined.
It also opens the way for âsubtractiveâ operations by way of element-wise division.
So, we might expect,
q = "The ball is blue and not big."
to be (close to) the normalized Hadamard division of x
by y
.
Code by GPT-4