Fine-tuning for better extraction

sergeliatko · August 20, 2024, 7:15pm

Welcome to the forum. I had a similar problem couple of years ago which was solved with the combination of semantic chunking, rag and custom data extractors. The whole solution ended up being as an analysis and data mining framework. Our use case is for legal documents but I see that the same thing can be easily applied to your use case.

I think I can help you to figure that out and that will be a great example for the new service I’m launching here: https://www.simantiks.com

Can you please share an example of a file you are extracting the data from and what kind of data you need to extract.

The data extractor description should look like this (approximately):

Question: what is the date of the event?

Queries:

event date
on … at
…

Examples:

October 2nd, 2024
mm/dd/yyyy

Where are the question is basically the instructions for the llm to parse the input and produce output, also used as query to RAG.
Queries is a list of words sentences keywords similar to what it looks like in the search document. The query vector is adjusted by 0.6 towards the center of the vectors of the queries to improve the rag precision.
Examples are provided to llm as the examples of desired output format.

If you can provide me several examples of the above and The Source file or files if you wish I can run it through my framework and report the results here so that we continue the discussion.

Sure the whole thing will be free I’m gonna use it as a marketing case.

Topic		Replies	Views
Fine tuning model for custom entity extraction API fine-tuning	1	1625	May 11, 2023
Looking for Tips to Improve Document Search and Thread Management in OpenAI Assistant API API api , semantic-search , threads , assistants-api , assistants-files	5	387	August 22, 2024
Fine-tuning for text classification / finding relevant parts in huge documents Community fine-tuning	3	130	December 2, 2024
Fine tuning vs. uploading data & using in file search with prompts API assistants-api	1	77	March 22, 2025
New Assistant feature and Fine-tuning API	4	3743	February 5, 2024

Fine-tuning for better extraction

Related topics