How many embedding vectors are we talking here?
I just ran a benchmark using AWS Lambda. Python 3.11 and both ARM64 and x86_64, and searching 400,000 embeddings takes less than 1 second.
The best processor now is the ARM64 one.
To get Numpy for ARM64, just add the AWS provided layer AWSSDKPandas-Python311-Arm64
The main limitation is memory (see printouts below).
On my local machine, I can search ~1 million vectors x 1536 dims in less than 1 second.
But here are the public AWS cloud benchmarks if you want to go with Lambda.
MIPS naive: Elapsed time 0.9262909889221191
MIPS extremes: Elapsed time 0.9536223411560059
MPIS vectorized: Elapsed time 5.194347858428955
MIPS top: Elapsed time 0.9222877025604248
400000 x 1536 (# vectors, # dims per vector)
Memory Size: 10240 MB Max Memory Used: 9526 MB
Python 3.11, x86_64
MIPS naive: Elapsed time 0.8463380336761475
MIPS extremes: Elapsed time 0.8570494651794434
MPIS vectorized: Elapsed time 4.841116905212402
MIPS top: Elapsed time 0.841890811920166
400000 x 1536 (# vectors, # dims per vector)
Memory Size: 10240 MB Max Memory Used: 9532 MB
Python 3.11, ARM64
Believe it or not, a for-loop over an enumerate iterator is faster than a vectorized numpy operation.
Here is the MIPS top code (+supporting function) I wrote that is faster than direct vectorization in numpy.
def sortPerm(x,reverseFlag=False):
if reverseFlag:
idx = np.argsort(-x)
else:
idx = np.argsort(x)
s = x[idx]
p = idx
return s, p
def mips_top(q, vecs, Nextremes):
NegInf = float('-inf')
TopN = np.array([NegInf] * Nextremes)
TopIdx = np.array([-1] * Nextremes)
mTopN = float('-inf')
for i, v in enumerate(vecs):
c = np.dot(q,v)
if c > mTopN:
# print(f"Found new TopN: {c}")
TopN = np.append(TopN,c) # append the new value
TopIdx = np.append(TopIdx,i) # append the new indice
TopN, Perm = sortPerm(TopN,True) # arrange from largest to smallest by setting 'True' flag
TopIdx = TopIdx[Perm]
TopN = TopN[:Nextremes]
TopIdx = TopIdx[:Nextremes]
mTopN = np.min(TopN)
return {"TopIdx":TopIdx,
"TopN":TopN}
MIPS stands for “Maximum Inner Product Search”, which is what we are doing here for unit embedding vectors.