I did a couple of different things here. Using the documents’ table of contents as a basis, I identified individual sections within a document, then extracted and summarized the text from those individual sections. The individual section summaries plus the section title was then consolidated into the full summary.
Under this approach, most sections are normally within the token limit.
Where the section size exceeds a pre-defined threshold, I split the section into multiple parts, summarize these individually and put them back together. To ensure coherence of the individual summaries within a section, I constructed the prompt such that the summary of the preceding part is considered.
It’s designed as a fully automated process.