⬛ Splitting / Chunking Large input text for Summarisation (greater than 4096 tokens....)

matt_s · June 2, 2022, 4:09am

Amazing - checking it out now! Thank you.

Edit - it’s similar to the approach I’ve been taking. I think the main difference here is that summarising a novel you can sacrifice detail. When analysing certain documents however, let’s say an RFP - you don’t want to lose certain details such as ‘Budget’ and ‘Deadlines’. I’m sure GPT-3 can handle this is given the right approach. For any input less than the token limit one shot is enough.

@daveshapautomator have you experimented with more structured extraction of features?

Edit 2 Textwrap lib seems like it can help with chunking and whitespace processing.

Edit 3 @daveshapautomator I think some of what you talk about here will also help, in your example where you extract medical information and prognosis How to prevent Open AI from making up an answer

Topic		Replies	Views
Best way to create responses that exceed token length Prompting	10	4582	December 17, 2023
New 4-turbo model has a unique limit? Or is this a bizarre hallucation? API	18	4361	January 26, 2024
How do I summarise a block of text larger than the token limit? API	13	8867	December 17, 2023
Train (fine-tune) a model with text from books or articles API	62	26942	November 30, 2023
Summarizing and extracting structured data from long text Prompting gpt-4 , api , token , limitations	14	11571	February 19, 2024

⬛ Splitting / Chunking Large input text for Summarisation (greater than 4096 tokens....)

Related topics