Multi Span QA vs long document sequence labeling

Boglinger · February 22, 2023, 4:22pm

Hi.

I am looking to try out the GPT-3 capabilities for multi span QA / large passage sequence labeling.

In more detail, I am trying to extract a legal passage in a legal contract and the according severity of the legal passage (which in itself is again a passage of the contract).

An example:

…DOCUMENT:

The insurance is paying 50 % of the costs of subjects described in §3.
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
The insurance contains insurances for the following subjects:
Car accidents if the accident is not self-inflicted.
Household accidents which are not inflicted by natural catastrophes.

So the objects I want to find/label here are the two insurance subjects (here 3.1 and 3.2) and their designated insurace properties (50 % formulated in 1.).

These can be seen as

item: "Car accidents if the accident is not self-inflicted."

item: "Household accidents which are not inflicted by natural catastrophes."

Their designated property is 50 % for both, so:

property: “50 %”

Any leads on how to setup a dataset and training cycle for a gpt-3 model are highly appreciated.

Cheers.
L

Topic		Replies	Views
GPT-3 to markup a document API	1	933	December 16, 2023
Text extraction based on complex rules Prompting gpt-4 , plugin-development	1	547	November 9, 2023
Unstructured text to dataset API	3	1930	December 16, 2023
Using gpt-4 API to Semantically Chunk Documents API embeddings	39	1397	April 24, 2024
Trainining based on complex text API gpt-4 , chatgpt , api	8	1052	July 5, 2023

Multi Span QA vs long document sequence labeling

Related Topics