I think a starting point for one approach is as follows:-
Decide on a chunking strategy. For example:
Split document into paragraphs (if not available use sentences) – call them chunks.
Take the next N chunks such that Sum of Number of Tokens is less than T.
T is the MAX_TOKEN_COUNT (4096, 8192 or whatever) – minus the required summary output length, minus the current CONTEXT_SUMMARY.
CONTEXT_SUMMARY is a buffer that you maintain separately that is a summary of context so far so that so that GPT completions has access to context summary.
It’s not a great approach but this is where I would start.
E.g. lets say you have a novel.
You would chunk it into paragraphs.
Take the first N paragraphs from Chapter 1. Summarise N and record the output O1
Take the next N paragraphs continuing Chapter 1. Summarise N and record the output O2
At the end of Chapter one, us O1, O2… Oi to create a Chapter 1 summary C1.
Then for chapter 2
Take C1, plus first M paragraphs from Chapter 2, Summarise M and record the Output P1.
In this way completions has access to the summary so far.
You can even get creative with your prompt engineering to facilitate this.
“Given SUMMARY which is a running summary of context so far, summarise the following ”
Good luck!