Fine tuning model for custom entity extraction

waseem432000 · May 10, 2023, 9:59pm

Hi everyone, I can’t find any previous post related to my use case so posting this to get some starting direction for a use case.

I have a usecase where I need to find specific custom entities from large text files. The files are legal documents and my entities are quite well defined. The raw text files are in the following format

Q1….
A1……
Q2…
A2…

And so on. The questions are my entities and the base 3.5 model returns decent enough results if the answers are really small ones(No, Not available, Not applicable) but it struggles to extract entities and their answers when the answers like descriptive(for example if a question is about a
legal matter and the answer is Yes, there is usually a page or a few paragraphs of text) but the base model either only returns a few lines of text or detects part of answer as separate questions.

Is fine tuning a model the right way to go about it or am I thinking it wrong? I have explored some specific custom entities models including AWS and Azure but they are quite expensive and just not accurate enough.

waseem432000 · May 11, 2023, 1:32am

Broadly speaking, it is quite like Fuzzy Search. I do plan to use the extracted bits to then create relevant embedding so I am going to use open AI anyway.

I may be missing something really obvious but can you please point me why I shouldn’t be using open AI for fuzzy search? I am hoping it does a better job than the other models I mentioned (or libraries like FuzzyWuzzy etc) as the raw text I plan to pass to extract entities, can be prone to mistakes due to original PDF being in funny layouts and how text extraction libraries can extract the text in wrong order due to layouts etc.

Topic		Replies	Views
Fine tuning with my own files like company documentation API	2	1085	May 5, 2022
Fine-tuning for text classification / finding relevant parts in huge documents Community fine-tuning	3	165	December 2, 2024
Multitasked Fine Tuned Model API	3	442	February 22, 2023
Fine-tuning with contextual embeddings API	2	1310	May 7, 2023
Is Fine Tuning the right approach for me? Community fine-tuning , fine-tuning-problems	0	560	November 30, 2023

Fine tuning model for custom entity extraction

Related topics