How many embedding vectors are we talking here?

I just ran a benchmark using AWS Lambda. Python 3.11 and both ARM64 and x86_64, and searching 400,000 embeddings takes less than 1 second.

The best processor now is the ARM64 one.

To get Numpy for ARM64, just add the AWS provided layer `AWSSDKPandas-Python311-Arm64`

The main limitation is memory (see printouts below).

On my local machine, I can search ~1 million vectors x 1536 dims in less than 1 second.

But here are the public AWS cloud benchmarks if you want to go with Lambda.

MIPS naive: Elapsed time 0.9262909889221191

MIPS extremes: Elapsed time 0.9536223411560059

MPIS vectorized: Elapsed time 5.194347858428955

MIPS top: Elapsed time 0.9222877025604248

400000 x 1536 (# vectors, # dims per vector)

Memory Size: 10240 MB Max Memory Used: 9526 MB

Python 3.11, x86_64

MIPS naive: Elapsed time 0.8463380336761475

MIPS extremes: Elapsed time 0.8570494651794434

MPIS vectorized: Elapsed time 4.841116905212402

MIPS top: Elapsed time 0.841890811920166

400000 x 1536 (# vectors, # dims per vector)

Memory Size: 10240 MB Max Memory Used: 9532 MB

Python 3.11, ARM64

Believe it or not, a for-loop over an enumerate iterator is faster than a vectorized numpy operation.

Here is the MIPS top code (+supporting function) I wrote that is faster than direct vectorization in numpy.

```
def sortPerm(x,reverseFlag=False):
if reverseFlag:
idx = np.argsort(-x)
else:
idx = np.argsort(x)
s = x[idx]
p = idx
return s, p
def mips_top(q, vecs, Nextremes):
NegInf = float('-inf')
TopN = np.array([NegInf] * Nextremes)
TopIdx = np.array([-1] * Nextremes)
mTopN = float('-inf')
for i, v in enumerate(vecs):
c = np.dot(q,v)
if c > mTopN:
# print(f"Found new TopN: {c}")
TopN = np.append(TopN,c) # append the new value
TopIdx = np.append(TopIdx,i) # append the new indice
TopN, Perm = sortPerm(TopN,True) # arrange from largest to smallest by setting 'True' flag
TopIdx = TopIdx[Perm]
TopN = TopN[:Nextremes]
TopIdx = TopIdx[:Nextremes]
mTopN = np.min(TopN)
return {"TopIdx":TopIdx,
"TopN":TopN}
```

MIPS stands for â€śMaximum Inner Product Searchâ€ť, which is what we are doing here for unit embedding vectors.