My notes on the subject:
Summarize Large Documents
- How to Summarize a Large Text with GPT-3
- How to Summarize a PDF file with ChatGPT (70 000+ Words)
-
State of the Art GPT-3 Summarizer For Any Size Document or Format | Width.ai
- Smaller chunks allow for more understanding per chunk but increase the risk of split contextual information. Let’s say you split a dialog or topic in half when chunking to summarize. If the contextual information from that dialog or topic is small or hard to decipher per chunk that model might not include it at all in the summary for either chunk. You’ve now taken an important part of the overall text and split the contextual information about it in half reducing the model’s likelihood to consider it important. On the other side you might produce two summaries of the two chunks dominated by that dialog or topic.
-
Building a Summarization System with LangChain and GPT-3 - Part 2 - YouTube
- “Extract the key facts out of this text. Don’t include opinions. Give each fact a number and keep them in short sentences.”
- Fact check summaries.
-
Building a Summarization System with LangChain and GPT-3 - Part 1 - YouTube
- Summarization Methodologies
- Map Reduce
- Chunk document. Summarize each chunk, then summarize all the chunk summaries. Using this currently in embed_solr_index01.php.
- Stuffing
- Summarize entire document all at once, if it will fit into prompt.
- Refine
- Chunk document. Summarize first chunk. Summarize 2nd chunk + 1st chunk summary. Summarize 3rd chunk + 1st and 2nd chunk summary. And so on…
- Map Reduce
- Summarization Methodologies
- Chunk large document by creating a list of summaries
- Break document down into chunks, then summarize each chunk, then submit the list of summaries as the document.
- https://community.openai.com/t/how-to-send-long-articles-for-summarizat…