Fine tuning model for custom entity extraction

Hi everyone, I can’t find any previous post related to my use case so posting this to get some starting direction for a use case.

I have a usecase where I need to find specific custom entities from large text files. The files are legal documents and my entities are quite well defined. The raw text files are in the following format

Q1….
A1……
Q2…
A2…

And so on. The questions are my entities and the base 3.5 model returns decent enough results if the answers are really small ones(No, Not available, Not applicable) but it struggles to extract entities and their answers when the answers like descriptive(for example if a question is about a
legal matter and the answer is Yes, there is usually a page or a few paragraphs of text) but the base model either only returns a few lines of text or detects part of answer as separate questions.

Is fine tuning a model the right way to go about it or am I thinking it wrong? I have explored some specific custom entities models including AWS and Azure but they are quite expensive and just not accurate enough.

Broadly speaking, it is quite like Fuzzy Search. I do plan to use the extracted bits to then create relevant embedding so I am going to use open AI anyway.

I may be missing something really obvious but can you please point me why I shouldn’t be using open AI for fuzzy search? I am hoping it does a better job than the other models I mentioned (or libraries like FuzzyWuzzy etc) as the raw text I plan to pass to extract entities, can be prone to mistakes due to original PDF being in funny layouts and how text extraction libraries can extract the text in wrong order due to layouts etc.