After looking at ways to handle embeddings, in my use case storing embedding vectors in my own database is not efficient performance-wise.
Basically I need to store around 50 kb of text for each piece of text and it is possible to have up to 1000 such embeddings.
The problem is when I need to query them; the response could have up to 50Mb.
That eats bandwidth and resources very fast.
So I looked at Redis for a way to handle the embeddings and to my surprise I found out that they have already a function to handle vectors/embeddings.
However, I just learned about it and I would like to know if anyone has experience with it.
Also, if this is a good alternative for embedding handling, would be nice to include it in OpenAI documentation recommendations, along with the two paid services mentioned there.
Full disclosure - I’m a Redis employee. VSS is indeed a capability within Redis (part of RediSearch functions). Our last GA release of Search (2.6.3) further enhanced the VSS functionality. Example: we can support storage of embeddings within JSON docs now, in addition to Hash Sets.
I’ve put some example Python code out there to demonstrate how to store vectors in Redis and perform KNN and ‘hybrid’ searches (a combination of general search on other attributes + KNN on the vectors).
Hi @georgei !
I too work at Redis and have been working on VSS for a bit. I’ve made a few applications that demonstrate some simple use cases with GUIs.
You can check out the hosted versions
I also can’t post more than 2 links since I’m a new user so I’ll just comment again.
My team and I have also written a couple pieces about Redis and Vector Search in general at
@curt.kennedy and @raymonddavey, I can’t reply more than three times so I’m just editing this reply, but you can also just use Redis + RediSearch which are open source.
I’m obviously biased but I like the Redis-Stack docker container because it’s easy to setup and I store my vectors and metadata in JSON in Redis. You could store the text alongside the vector and return it in the same call as the query. see
return_fields. individual clients can also be stored as individual indices in the same database.
Esp if you don’t need to worry about latency, or fault tolerance over long periods of time, OSS should be fine. Benefit there is that if requirements change and you do need it, you don’t need to change your code at all.
You could also setup something to dump the contents to file (S3) and then spin it back up on demand. Just another thought.
Lastly, I’ve seen people implement similar things to what @curt.kennedy has suggested and it’s worked well for multiple use cases. ex. just saving FAISS index to file in S3, loading and running in some cloud function or server on demand or during peak times. Sometimes serverless is less complicated and saves money. this being said, I saw a user this week switch from DynamoDB to Redis because it was cheaper at scale and Dynamo has a limited value size, something like 400Kb (Also you can’t actually run VSS in DynamoDB)
Also the codebases for both are OSS and on Github at Redis Ventures · GitHub
Lastly, we recently did a VSS hackathon which had some really interesting entries which you can read about here: Redis Vector Search Engineering Lab Review - MLOps Community
if you have any questions about it feel free to reach out to sam (dot) partee@redis (dot) com
I might have to check out Redis.
Another approach that might work for you is to hash each text entry and store it in a database of at least Hash/Text/(Vector. optional). Then create a data structure in memory with only Hash/Vector. Search over all vectors in memory, get the closest N, and return the Hash values. Then retrieve the Hash values in your database. I can get less than 1 second latency with 400k items (embeddings) in a severless environment with this approach. My search was naive too, only looking at dot products since my embeddings are unit vectors.
@curt.kennedy I am intrigued by your post. I was also using a unqiue id instead of a hash and referring back to another table with the actual text.
Out of interest, where/how do you store the 400k vectors and hashes? Do you store them as strings in a file or database of some sort, and then convert them into vector objects as you load them into memory?
I’ve looked at Pinecone, Milvus, and plain csv files, but haven’t decided on a solution yet. I will also look as Redis.
Using Python and AWS here …
The Hash/Vectors are stored in a dictionary with the key as the Hash and the Value as a vector. The vectors are all numpy arrays. This is then saved as a pickle, and loading in memory upon cold start.
The raw data is stored in a DynamoDB table. But this is only used as a repository and isn’t used for live processing. Everything in DynamoDB are of string type, even the vectors (list to str), and use ‘ast.literal_eval’ to undo this when making the pickle of numpy arrays, like so:
EmbeddingArray = np.array(ast.literal_eval(Embedding))
When looping over the vectors, once you find the close ones, just pull the indices to get the corresponding hashes (the hashes and vectors are in separate arrays when processing, but they correspond one-to-one and have the same length)
It runs really fast for me, and I don’t anticipate hitting 400k embeddings anytime soon for what I’m doing.
Thanks for the outline of how you did this. I’m not using Python for this project, but the ideas map directly across for me.
I have the long term need to have sets for individual clients, so I cant spool them up into memory in advance. But for the MVP, I will try your ideas and consider Pinecone etc when I go to production.
I’ll wrap the it into saving, recall, and searching functions so I can swap them out later
My issue with Pinecone and other vector databases is the hourly cost of hosting those instances. I am primarily serverless and event driven, where the events are sparse in time. So it doesn’t make sense for me. But if you have tons of embeddings (several million or billions) and a low latency requirement, using these services would make more sense.
As for many clients, if you ran each client on a separate instance, that should lower your memory per instance, right? I don’t know how much memory you are talking, but my approach, using AWS Lambda, can handle up to 10G of memory per function. But again, if your traffic is continuous, there is probably a better approach using Docker or Kubernetes (or whatever).
You have highlighted the same concern as me.
In practical terms, the users in a government agency, or a research facility, may use the system for 5 minutes and then not use it again for an hour or more. It’s also not being used throughout the night - so the per hour cost doesn’t work as well.
With the cheaper models, its not so much of a problem, because you can absorb a small monthly cost. But when you scale up (even with sporadic use), the cost of hosting the data on an hourly basis is quite high.
My current strategy is to make user sessions time out (ie log them out). And when they login, I will spool the hash tables into memory (using your technique) and keep them there until they logout again (or timeout)
The time between logging in, and entering the first query should give enough time for even a huge data set to be initiated. In reality, all the sets will be less than your example of 400k records - and if they are not, then I can host a unique instance under an enterprise offering.
You have given me a few new technologies for me to investigate. Thank you.
I’ve also considered using Word2Vec (or similar) for the embedding and recall part, and only using GPT for the final query. The vector space (number of dimensions in the array) will be smaller, but because the source document domain is smaller, it may not be an issue. I have to try it to see if it works.
I’ve also look as Milvus on a dedicated AWS EC2 instance, but having to install and set it up is doing my head in at the moment, due to lack of bandwidth in my brain. Maybe I can look at that when I get past other things I’m doing.
Sounds like we are thinking of the same things.
I too have wondered about other embedding options, smaller vector space sizes, faster algorithms, faster languages than Python, etc.
But, as you know, it’s all about bandwidth (or lack thereof).
When I started this thread I was at the point where I had to choose between different vector search databases.
Since I didn’t knew too much about any of them, I picked weaviate, because it had the shortest learning path.
I didn’t understood Pinecone’s pricing model and I also preferred a service which supports NodeJS, at least in documentation.
As for Redis, it was not clear to me what it means to load a whole database in memory. I also need to integrate it with Heroku and the pricing increases.
One of the advantages of weaviate is that it integrates OpenAI and it reduces your server efforts this way.
The downside of weaviate (at least) is that when you want to change the OpenAI model used for embeddings, you have to reindex manually the whole database, because it won’t be compatible anymore, and this is sort of painful.
I liked Redis in other projects and I’d love to use it, but I’d need to evaluate the costs and efforts.
At the first glance the pricing from Heroku is somewhat discouraging. For 1000 connections it would charge $200. Is true that at 1000 connections my app would have a traffic enough to pay the service, but almost for the same benefits I’d pay $60 on weaviate.
But as I said, this is just a quick analysis.
Can you update us on your use of Redis for embeddings?
Any news? Code to share?
Are you using Redis with Rails? How exactly are you using it (the architecture)?
No, I didn’t went too far with Redis for embeddings.
But OpenAI is now including Redis more frequently in its documentation and code examples.
Here is one:
Do you have any golang sample code of vector similarity in Redis?
Maybe just ask GPT-4 to translate whatever example you want to build from into golang?
I only have ChatGPT free but I will try, hope it work.
Shameless plug for my local Vector DB project here, Vectra… Vectra is a local file based Vector DB that should be great for mostly static content and since it’s local mostly <1ms query times.
And a python port someone did:
The primary reason I mention here is that it seems like it would be well suited for the posters scenario…
If your content is static like an individual file I’m not sure why you wouldn’t use Vectra. It’s free, supports metadata filtering, and since it’s local query times will be under 1ms once the index is loaded into memory.
Unfortunately, no. On a positive note, there’s active development for support of VSS (and Search in general) for the go-redis client lib.