Use of davinci for question and answer from knowledge documents

Hi, I wanted to validate if my approach to solve this problem is inline with best practice

Problem Statement : I have many different knowledge documents, articles. I want to use GPT models like DAVINCI or CURIE/ADA/BABBAGE to output an answer based on the content of the documents when a user asks a question. I want the model to answer based on the documents only and also would like to spit out the name of the source document from where the answer came from

I have thought of 2 ways to solve it, in the 2nd technique, I do not know how to spit out the source of the document

Solution#1 - Feed the content of the documents to the GPT model and prompt it to extract a set of questions and answers from the document. Use that output to create a training set as below. In the completion, add the source name also(in the below, I added WIKI)

{“prompt”:“What information will Form AME95 include?Please also mention the source of information. +++++”,“completion”:" Form AME95 will include your name and the name of your large employer, the months during the prior calendar year when you were eligible for coverage, and the cost of the cheapest monthly premium you could have paid for coverage under your employer’s health plan. If you worked for a large employer that did not offer its full time employees health coverage, Form 1095-C will also indicate"}

Use this set of prompt/completion to train the model, when the completion response is output, strip the source name(wiki in this case) programmatically

Solution#2 - Use embedding to identify similar documents that can have the answer to question. Add those documents as additional context to the question prompt. This will ensure that the question is answered from the documents passed as additional context only. Here I do not know how to add the “source of the answer” as a metadata


Solution #2 is usually the best method to do what you’re trying to do.

You can store the Embeddings vectors in an object that also contains the links to the original source.


{embeddings: [
    vector: VECTOR array,
    link: LINK string

If you use a vector database then you can set the link path as a property in the ‘meta’ for each vector.

To add, there are lots of documentation to do exactly this.

