Multi Span QA vs long document sequence labeling

Hi.

I am looking to try out the GPT-3 capabilities for multi span QA / large passage sequence labeling.

In more detail, I am trying to extract a legal passage in a legal contract and the according severity of the legal passage (which in itself is again a passage of the contract).

An example:

…DOCUMENT:

  1. The insurance is paying 50 % of the costs of subjects described in §3.
  2. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
  3. The insurance contains insurances for the following subjects:
  4. Car accidents if the accident is not self-inflicted.
  5. Household accidents which are not inflicted by natural catastrophes.

So the objects I want to find/label here are the two insurance subjects (here 3.1 and 3.2) and their designated insurace properties (50 % formulated in 1.).

These can be seen as

item: "Car accidents if the accident is not self-inflicted."

item: "Household accidents which are not inflicted by natural catastrophes."

Their designated property is 50 % for both, so:

property: “50 %

property: “50 %

Any leads on how to setup a dataset and training cycle for a gpt-3 model are highly appreciated.

Cheers.
L