There are a few reasons that finetuning is not an appropriate use for knowledge storage and retrieval (but it is good for other aspects of the Q&A process).
- You would need to continuously finetune a model as you add to your KB, database, or repository.
- In some cases, this would be prohibitively expensive (some organizations have many gigabytes or terabytes of data to sift through).
- There are better tools, such as search indexes like SOLR, ElasticSearch, Pinecone, Weaviate, and others that are lightning fast with search plus they can integrate vector-based search.
- While finetuning does reduce confabulation (spontaneous generation of false/imaginary information) it does not completely remove it.
It’s critical to remember that finetuning only increases the consistency of behavior, it does not teach the model anything new (not really).
We may arrive at a time in the future when neural representations of memory make sense. Indeed, this possibility is intoxicating to think about - the idea that we can compress and arbitrary amount of knowledge into neural embeddings is great. However, keep in mind that this is how LLMs are already trained and yet confabulation becomes an issue. You have no way of knowing if the model is reporting accurately or not. This necessitates an external repository of trusted facts. In other words, if you rely on neural memory today, you’ll still need to solve the search problem to ensure that you’re reporting accurate information. So you might as well just cut out a step and use search integrated with QA.
I have been hard at work on several microservices that will aid in this endeavor. They are not fully optimized, so they cannot scale beyond a few tens of thousands of documents (yet). But I plan on integrating FAISS into them, which means they will be able to scale to billions or trillions of documents in the coming months.
This is a RESTful microservice that includes several search functions. It’s basically an extremely lightweight version of SOLR, but puts vector search first. As mentioned, it is not optimized so it’s not nearly as fast as SOLR, but it is easy to use.
This is a RESTful microservice that performs offline embeddings using Google’s Universal Sentence Encoder v5. This produces smaller embeddings that even ADA, but it does it for free (and it’s lightning fast). This is optimized for short entries, such as single sentences or short paragraphs.
This is a python module meant to emulate SQLITE (a lightweight serverless SQL) called VDBLITE (vector database lite). This is the same work that the Nexus is based upon. This can act as a serverless vector database and search engine. Again, it is not optimized yet but it should work for up to 100k or 1M documents (depending on size, some folks say it starts to fail at 400k). But it should certainly be able to remain performant and stable in the 1k to 10k range: