Your GPU might pair well with the open source Facebook AI Similarity Search (FAISS). But if you have less than 1 million embeddings, like discussed above, you can do this “by hand” with the naive searches like this:
def mips_naive(q, vecs):
mip = -1e10
idx = -1
for i, v in enumerate(vecs):
c = np.dot(q,v) # dot is the same a cosine similarity for unit vectors
if c > mip:
mip = c
idx = i
return idx, mip
Also you could use Redis, see this thread: Using Redis for embeddings