Imagine you have a company handbook and want to use openai to make a FAQ bot working on the handbook. So colleagues could ask any question and the bot will give you the answer according to the handbook. How would I do this? Prompting the whole handbook is not possible. Fine tuning completions needs too much data, I think. Any idea maybe? Really hope to get some Exchange here.
Welcome to the forum.
We’ve got a few great threads about this topic.
Embeddings are your missing link.
I created a couple of diagrams that provide a high-level overview of the process. You can find them here.
This is exactly how I created a PKM solution.
Hypothetical Document Embeddings (HyDE) is an embedding search technique that begins by generating a hypothetical answer and then using that hypothetical answer as the basis for searching the embedding system. It is a proven way to improve the accuracy of question answering by surfacing content that better matches the underlying intent of the query.
Well stated. This approach is so not understood by almost everyone, yet, it represents the most efficient and cost-effective architecture that can solve many AI requirements. Thanks for sharing!
So I Send my question and openai checks the probability of what answer might be best. But how do I store all the questions and answer before?
There are many ways to do this. I approached this very simplistically using embeddings.
Vectorize the Questions
This term is a bit fuzzy, but it means that given the answer to a question, push it into the LLM and ask for it to give me the vector for that blob of text. This is known as an embedding vector. It’s like a little mathematical fingerprint that identifies the text inside the LLM.
Store the Embedding Vector
This is the simple part - given the embedding vector, save it in a database that makes it easy to recall. We’ll refer to this as the vector database. But don’t just save the mathematical fingerprint; save the text of the question, the answer to that question (which you already know) and save some other valuable information about the answer, such as a list of keywords about the question and the answer, perhaps the date it was created, and who created the answer This meta-data about the answer may come in handy.
The UX - Answer Questions
This is the solution part of the process. It’s where a user asks a question but has no idea what the answer is or even how to ask it. Their question is entered into a UI of some sort, and the solution uses their naturally typed question to get an embedding vector in the same manner you used to vectorize your known questions. With the embedding vector from the user’s question, query the vector database to see which of the known questions scores the closest to the user’s question vector. The query result from the vector database includes an inference score which can be used to isolate the top five hits (for example).
The Recommended Answer
With the top closest matches, the one with the highest score is probably the best match, and because we planned ahead, that matching item contains the answer content and the original question text used to instantiate the matching vector. You now have all the content required to provide the user with the best answer given that user’s question. This may not be enough to make the experience ideal. You may also need to extract keywords from the user’s question and use them to filter the highest-scoring matches.
Embeddings have many advantages, including a training approach that is easier than a fine-tuned model. But more advantageous, this approach is very easy to update or add new precision by simply revectoring with new information. It is also financially practical because embeddings are about 1/75th as costly as other GPT inference processes. Building infrastructure to create and manage AI solutions based on embeddings is also relatively simple because you’re managing the solution like you would any content management or data management process.
Vectors are not the easiest elements to manage or match in a query process. While it is possible to perform mathematics involving cosines in Python data structures and even relational databases, these are specialized and slightly more complex than simply querying for keys in an index. As such, I recommend a little reading about Pinecone or Weaviate, databases designed to simplify vector storage and queries. Another disadvantage to embeddings is inference precision. I’m not an expert in this field of AI per see, but tuning an embedding system to deliver high-confidence results for users requires some effort, and this is why I recommend at least considering other GPT services to extract keywords, summaries and even entities to enhance the data set for creating the embeddings and filtering the best answers from the vector database.