Best practice for a big RAG

Hi guys

I wondered if someone is willing to share how they dealt with an extensive document.

Let’s do a simple one - in the database, we have a session transcript of 15,000 words. How to respond to a user’s question

  • give me a top 5 takeaways from the session

Obviously, 15,000 words are too big for the ChatGPT3.5 token window. GPT4 can accept it, but it’s dreadfully slow, varying between 20 seconds and over 2 minutes.

So what is your approach with ChatGPT3.5

thanks

If you need to answer questions like these that concern the entire context, then you’ll either need to send the entire context or create a summary that you can retrieve to answer the question.

Yes exactly but obviously, I can’t send the full content as it is bigger than the token window :man_shrugging:t3::man_shrugging:t3:

Use GPT-4-turbo instead, it’s context window is 128k

thanks mate, it does. But it takes 30+ up to 2 minutes to get answer from it

Yeah, in my experience, even at Tier 5 billing, it can take minutes for 4k+ prompts…

Could you split it up maybe?

Honestly, I’d probably break it down into smaller chunks, like 1000-1500 words each. That way, I can focus on one chunk at a time and use ChatGPT3.5 to help me extract the key points. It’s like eating an elephant, one bite at a time!

Another approach I’d take is to use some keyword extraction techniques to identify the most important phrases and keywords. That way, I can quickly see what the document is about and what’s most relevant.

If I had to get really manual, I’d just sit down and read the thing, taking notes as I go. It’s old school, but sometimes that’s the best way to really understand what’s going on.

Lastly, I might use some other tools, like spaCy or NLTK, to help me preprocess the document and extract key points. It’s like having a team of experts helping me out!

So yeah, that’s how I’d tackle that beast of a document!

1 Like

Maybe you could come up with some scoring mechanism where you can take any paragraph and assign it a score, either by using Cosine Similarity (i.e. vector database) to some “known” vector in semantic space, or just doing a prompt that asks for a scoring based on certain criteria. That way even if you are splitting up the content into chunks of arbitrary sizes the scoring mechanism will still sort of be “apples to apples” comparisons so to speak. Then it’s a simple matter of choosing the top N (like N=5) pieces of content if you need to…because you generated the “score” of one or more of them totally independently in separate prompts.

1 Like