I want to build a chatbot that answers questions based on all the private data I feed and returns the relevant documents if searched. Can anyone please suggest open-source LLMs to be integrated in private data applications? I have checked out langchain, LLAMA2, LLAMA index, and privateGPT.
You would need two types of AI that are both under your control or which have a data security policy that is agreeable. (OpenAI can accommodate several requirements)
- the language AI that you converse with, which is given the supplementary data with which it can answer
- the AI embeddings engine that can enable semantic search on a database of your private documents
The former requires a significant GPU server hardware outlay for performance anywhere near OpenAI models, using large capable alternate engines.