So a few seconds, 10 seconds? What ballpark is it?
And this is for your 3.5k embeddings, right? Or was it 4.5M embeddings?
So a few seconds, 10 seconds? What ballpark is it?
And this is for your 3.5k embeddings, right? Or was it 4.5M embeddings?
This may be of interest:
They do offer some datasets for quick testing:
The link above kinda shows why I haven’t even bother to measure
Yes, 3.5K objects containing several text fields, each using embeddings of 1536 dimensions/vector
Wow, some impressive numbers!
I can see the downside, for folks like me, is that since I have sparse traffic, the hosting costs would eat me alive.
The tech is cool though. I have looked into FAISS as an algorithm, but the naive argmax works just fine for me (for now)
But will have to keep Weaviate in mind for sure
That’s why I’m using their cloud services…
In my case, I can run 400,000 embeddings @ 1-2 seconds latency for less than $1 per month, assuming system is settled post-cold start and no elaborate database backups, with sparse traffic.
Here my major cost is backups, oddly enough.
High volumes of traffic might drive me to a Weaviate. At that point it might be close on cost, but I’m pretty sure I’d have to ditch argmax to get latency anywhere close to Weaviate on latency!
I would have to ditch multiplies and go with the Manhattan metric, and code it efficiently (probably vectorized on the entire batch of embeddings at once). That might give me a fighting chance