Hi Cleveland,
Yes, it is totally doable, but it may be a bit complicated initially.
The approach to take (all of this is my personal opinion, so please take it like that):
Get your “facts” in a row (Extract regulations etc., facts into shorter pieces of text, label them with what they apply to and where they belong to, then embed, maybe use a database like waviate or similar for search) - they will be where the GPT3 model get knowledge from.
Build a testing app (internal use only) that allows you to input queries like “What are the requirements for grouts between plaster walls and ceilings?”. Then embeds this query and performs a search in your facts, sorts by cosine similarity the X most prominent items from where the answer can be taken out. Then you app needs to build a prompt like that:
“Provide an informative answer to the user inquiry based on the context items below:
<|context|><|item|>…your most prominent context…<|endofitem|><|item|>…next item etc.<|endofitem|><|endofcontext|>
<|userinquiry|>The user inquiry goes in here<|endofuserinquiry|>
<|reply|>”
And display that prompt to you
Note: items may need to contain not only the text but the references, etc. - so please make sure all of those needed things are included in the prompt because the model needs them to build the answer. Some coding may be needed to find context based on what you have found in the database (like getting items from the database, sorting the best candidates, and searching the source docs to get the complete context items).
Then you need to run the model (text-davinci-003 at the beginning to get the best results) to get the answer and output that answer into an editable text area for you to review.
The goal here is to teach the model to use the prompt to form the correct answer when possible and teach it how to answer when the prompt does not give enough information (not make up the answer). Do not try to teach it what to answer but rather how to answer based on the prompt (the goal of fine-tuning in your use case).
When you’re happy with the edits (or the reply is great out of the box), you need to have the button to save both the prompt (without the “Provide an informative answer to the user inquiry based on the context items below:” part) and the answer into a training file (see formatting in API docs).
Run this at least 500 times (1500 is better) to build your fine-tuning file. Then train a fine-tune (I would try davinci first).
Connect the fine-tuned model to your app and continue running to see if the results suit you (and continue growing the training data file).
Once you’re happy with the results your model gives you, try training curie on the training data and connect it to your app (this will save you response time and money).
Check the curie responses, and if they are good, you have your core model.
Then build a production app based on the core model.