Hi everyone, I can’t find any previous post related to my use case so posting this to get some starting direction for a use case.
I have a usecase where I need to find specific custom entities from large text files. The files are legal documents and my entities are quite well defined. The raw text files are in the following format
Q1….
A1……
Q2…
A2…
And so on. The questions are my entities and the base 3.5 model returns decent enough results if the answers are really small ones(No, Not available, Not applicable) but it struggles to extract entities and their answers when the answers like descriptive(for example if a question is about a
legal matter and the answer is Yes, there is usually a page or a few paragraphs of text) but the base model either only returns a few lines of text or detects part of answer as separate questions.
Is fine tuning a model the right way to go about it or am I thinking it wrong? I have explored some specific custom entities models including AWS and Azure but they are quite expensive and just not accurate enough.