How to let chatgpt fully digest a really large text?

I’m trying to use ChatGPT to analyze a large text, such as a book, and extract every detail from it. However, I’m not sure how to do this effectively. I’ve tried breaking the text into chunks and using embeddings, but this method seems to lose important contextual information.

For example, if I break up a sentence like “The dog bites Peter. It runs away after that. The cat smiles,” into three separate chunks, and then ask ChatGPT a question like “Which animal runs away after Peter gets bitten?”, it won’t be able to provide the correct answer.

Do you have any suggestions for how to let ChatGPT fully digest a large text without losing important context?"

1 Like

Rather than break it into small sentences, try and embed a paragraph (ex 3 sentences here) as a whole, so that the context is captured correctly. The length isn’t always going to be fixed, so you’ll have to try and figure out what works best for your case.

I have chunked the entire book into several paragraphs. but I’m still worried that some paragraphs might depend on information from previous paragraphs.

Share a full page and then ask it to read and “say Read when you read”. Then share the second page… Keep going.

If it’s an entire book, maybe try chapter wise or half a chapter wise. GPT will have some inherent knowledge as well, so it becomes a situation of whether it will be able to connect the dots or not, which it should be able to do. Like i said, most of the times, there’s no fixed answer to such problem and you’ll have to try and test which works best for your situation.

wai2018 you as a very good question. I’m not sure why people provide terrible answers instead of simply saying ‘I don’t know’… maybe they are ChatGPT bots themselves haha… Besides that, if there was a way to perform a search to return content across chunks and then merge them into a temporary chunk to answer the question this could be a solution. Not sure how a program could do this though.

You can implement embeddings overlap, describe here : The length of the embedding contents - #23 by curt.kennedy