How do you manage your context tokens for RAG?

Hey everyone,

Apologies in advance as I am a “beginner” in this field.

I’ve created a chatbot for Q&A about an uploaded file (pdf slides) using embeddings and RAG. However I always reach my token limit (either TPM or context) as when I ask the chatbot multiple questions I simply have to much tokens in my chat history due to multiple RAGs. How did you do it without deleting the “oldest” message in chat history? And how do you manage your context window in a conversation/thread?


Context tokens can be managed using embeddings.

I wrote a tutorial for simple context management last year: Use embeddings to retrieve relevant context for AI assistant

1 Like

Thank you so much!! This is perfect :smiley:

1 Like