Fine Tune: Document to Summary + Extraction

I have 100’s legal documents and their summaries. Each document is long (25000 words) and the summary is json/csv with some fields extracted from the document and other fields abstractive summaries of different sections of the document.

So my input to the fine tune would look like

{“prompt”: “…document text here…”, “completion”: “{case_id: 76dfe, case_date: 2023-01-12, summary: …abstractive summary…, other fields}”

2 Questions

  1. Can chat gpt generate output like above (json)?
  2. Can chat gpt process documents which have 25k - 35k words

Thanks for your time

Hi Challa, did you get a reponse? I believe the answer is no…we just tried to do a similar task of summarization and received the following response:

“This model’s maximum context length is 4097 tokens, however, you requested 7791 tokens (7691 in your prompt; 100 for the completion). Please reduce your prompt; or completion length.”

1 Like

You can chunk it, summarize,… there are multiple ways to achieve the result.
But for your way you got to use a model with ~65k token.

GPT-4 only get’s to 32k and they barely roled it out.