How to handle large context token limits?

Hi fellow enthusiasts.

What is the recommended approach if wishing to using large context files e.g. 30k tokens being passed into a prompt on the API?



You’ll have to compress the prompt by sending smaller pieces and asking the model to summarize each piece, before you send the total of the summarized pieces in for final inference.

Ah nice solution, thanks, I’ll give that a shot.

It might be worth checking out Sparse Priming Representation. You’d be able to instruct GPT with system prompts to effectively compress the data you want to give it. I think it’s a lot better than simply telling GPT to summarize the content you give it, but there’s still at least some details that ultimately get lost so it’d still make sense to be selective on what to compress compared to what should remain explicit to make the most of the token limits.

Thanks Matt, and thanks for sharing the link. I’m currently looking passing in the standard “you are a helpful chatbot message”, and I think I’ll get a better outcome trying these ideas

There are good suggestions on this thread already and on a side note the SPR approach is quite equivalent to what @jwatte suggested, just without the fancy wording.

Otherwise you can definitely look into RAG as it is currently the standard way to provide larger context to the model.
Without knowing the specifics of your use case it is hard to tell though. For example if you need several answers building on top of each other based on the context or need to extract information from the context.

Thank you VB. I’ll take a look at RAG (& Vector databases?), I’ve head a bit about them but I don’t have any real knowledge of them yet. My use case is I want an LLM to be able to answer a large volume of questions based on a large body of context, say 100 questions of varying topics and depth of answers required, form about 30k words of relevant context.

1 Like

Here, this should get you a good entry point.
Look for the examples.

1 Like

Today’s update with GPT-4 Turbo has solved my query! The only thing now is to watch the cost of my queries as these will certainly rack up if I’m sending large token sizes through regularly.


Is it OK if a moderator closes this topic?

Yes indeed, thank you for reminding me!