When it comes to vector database, we have many options such as: Qdrant, Pinecone, Weaviate…etc. I am curious if anyone has used MongoDB Vector search as your vector store. If yes, what is the performance for retrieval speed?
There are 2 reasons why i asked:
- Vector-search-only performance: we all know MongoDB is kind of late to the game (launching MongoDB Vector Search) comparing to other startup such as Pinecone. To my knowledge, to provide an efficient and fast vector database, there are tons of things to optimize, from search algorithms to caching to hardware (GPUs)…etc., which, in my opinion, is a valid claim made by these vector-only database vendors: we have pour ten of thousands of hours and man power to optimize vector search. It is not just about calculating dotProduct or cosine similiarity, but also about how to do millions of calculations to return matched result FAST. This is the main selling point that vector-only vendors claim that traditional vendors like MongoDB has not invested enough to build such a robust solution as they did.
- Hybrid-search: in reality, in many cases, we can’t build a good search engine with just vector search. Let’s say the chatbot for e-commerce shop. A simple question like “Do you have any sandals under $50?” can pose great challenge to vector-search-only because embedding-vectors are primarily good at nuance and relationship between words, not at price range query. Price range query can be solved easily by traditional search provided by MongoDB or any similar vendors.
I am the product manager for MongoDB Atlas Vector Search, so I think I can answer some of these questions.
- Our product is built on top of Lucene, which is an open-source search project that has both lexical and vector search capabilities. There are definitely many things to optimize about a search system beyond the algorithms, of which Lucene supports several including but not limited to the popular HNSW, which is the realm of database providers like MongoDB, ElasticSearch, and many others (including many vector-only providers!) who understand challenges with horizontal and vertical scaling, the benefits of a serverless approach, different deployment models, like dedicated search nodes in MongoDB, and more. Without naming names, I will say that the selling point I hear from a lot of vector dbs we get compared to tends to be that we couldn’t iterate quickly enough on algorithms like Vamana or product quantization, with little focus on the deployment model. I think the recent push towards serverless by several providers is an exception to this, but i generally believe that deployment models + algorithms together tend to be what drives both cost competitiveness and developer-friendliness. As to who can claim what the lasting advantage is anyone’s guess, but I would encourage seeing through what may be marketing and listen to vendors (including me) with a critical ear. I’d be curious if you tried Atlas if you found that it was not robust like they claim, and would be happy to work with you if you found any issues.
- This is definitely an important bit for any search system, as the value of keyword search will not go away even amidst the current LLM hype cycle (unclear if I’m allowed to type those words on an OpenAI forum). Traditional search systems tend to be on the lower end of the computational load per query spectrum at the cost of accuracy (at least relative to vector search) but there are cases where they are uniquely well-suited or hybridization offers advantages. This doesn’t even mention the benefits of typical structured data querying outside of the information retrieval framework, which obviously we also support. We often see folks composing vector search stages hybridized with full text search stages, as both provide accurate results based on very different but useful methodologies. I’m unable to add a link to this forum post, but if I did I would point you to our docs showing this can be done
Hope this helped!
@henry.weller Hey , thanks a lot for your detailed answer. I am trying out MongpDB Atlas and Vector Search, so far so good.