⬛ Splitting / Chunking Large input text for Summarisation (greater than 4096 tokens....)

pmshadow · March 14, 2023, 5:32am

Thank you so much for providing these langchain links! Exactly what I needed.
I tried to explain a little bit in layman terms how embeddings work and how they can be used.
I think summarizing everything before “needing them” might be an expensive overkill, as it is significantly more expensive than embeddings.

I am thinkibg about creating “rolling” embeddings with 2k-long overlap, so whenever I detect this “long but interesting document part” I can process only it doing iterations. I will test the approach in the next days

NikuPunk · May 2, 2023, 2:49pm

Thank You so much for providing solution of this problem but i want to pass the list of reviews say 10,000 hotel review and generate a summary of the given list of reviews so How can I split the list of reviews.

glib · May 27, 2023, 3:36pm

Hello. I’m working on solution that combines summarisation and extraction. Basically I need to make sure that every important information from the call is recorded in database.

Most talks are under the limit however some of them are over 8k tokens.

I’m wondering how small chunks for good summarisation should be. I expect that the smaller chunks are → the more information is extracted however there is also higher chances for “hallucinations”.

What’s your opinion on that. Which size is optimal, when retrieving data from the dialogue is important.

alayet.manel · November 21, 2023, 6:43pm

Did you try doctran interrogation method and predefine the parameters you want to extract than summarize it you can get a better understanding that way I’m a noob so this might not be the solution😅

Topic		Replies	Views
Summarizing and extracting structured data from long text Prompting gpt-4 , api , token , limitations	14	12720	February 19, 2024
Best way to create responses that exceed token length Prompting	10	4843	December 17, 2023
New 4-turbo model has a unique limit? Or is this a bizarre hallucation? API	18	4470	January 26, 2024
How do I summarise a block of text larger than the token limit? API	13	9124	December 17, 2023
Poor quality response on trained LLM with pdf files Community gpt-4	29	6276	May 1, 2024

⬛ Splitting / Chunking Large input text for Summarisation (greater than 4096 tokens....)

Related topics