I used to develop regulatory infobases (information databases) using Folio Views back in the day. Basically, these were huge databases of information designed to be keyword searched, returning links to the relevant sections in the documents.
I’m looking to experiment with something like that using OpenAI. The differences I understand are that a) an AI system like OpenAI is not doing keyword seaches and b) It not so much retrieving excepts as interpreting context. Or, something like that.
What I would like to do is make sure that, when searching regulatory information, a source reference (text or actual document link) is returned with each answer so that a user can locate the source of the response.
Take a look at this document: https://www.dre.ca.gov/files/pdf/relaw/2023/adminlaw.pdf
Note that it is broken down into Articles, and each Article broken down into sections.
Let’s say I break it down into individual sections and upload those sections to an OpenAI model.
How could I format the data so that a user could know the section from which a response was retrieved.
For example, if the user asks a question whose answer is found in Article I Section 11370.3., how do I make sure that reference information is returned in the response?
Is this even doable?
Now, I understand that I’m thinking in terms of an old keyword search and retrieval model as opposed to a modern Artificial Intelligence semantical context model, but I’m also looking at this from the standpoint of an end-user. If I am searching regulatory statutes, not only am I interested in the answer to a specific question, but also where in the statutes that answer can be found.
Interested in any comments and/or suggestions.