I am trying to create an application to analyze a given legal contract and answer a standard set of questions (like effective date, contracting entity, etc.).
I was able to do this to some extent using the below approach:
Break doc to chunks => Create embedding vectors => Find chunk with closest embedding to given query embedding => Text completion API with that chunk as context
But the main problem is that the answers are approximately ok, but not exactly what I want. I would ideally want to train the model with some examples from a sample set of documents.
Like this
“Document A, Question 1, Answer 1, Question 2, Answer 2;
Document B, Question 1, Answer 1, Question 2, Answer 2…”
But the fine-tuning API does not allow to pass any context in the training examples. How do I accomplish this?