I’m trying to summarise large tokens of input text using completions to pick out key facts common to my input data.
I have PDF RPFs being sent to me in a variety of formats and I want to pick out budgets, scope and key dates (submission deadline, project length, project completion date).
I’m parsing PDFs and then summarising text a paragraph at a time, however this approach isn’t optimal since not all facts appear in all paragraphs.
is there a preferred method using chunking or something similar to achieve what I want?
Example Google Colab notebook here: Google Colab
Any advice on approach or code examples much appreciated.
The main reference I managed to find on this was here: Summarizing Books with Human Feedback