How do you manage your context tokens for RAG?

APRO · February 26, 2024, 5:53pm

Hey everyone,

Apologies in advance as I am a “beginner” in this field.

I’ve created a chatbot for Q&A about an uploaded file (pdf slides) using embeddings and RAG. However I always reach my token limit (either TPM or context) as when I ask the chatbot multiple questions I simply have to much tokens in my chat history due to multiple RAGs. How did you do it without deleting the “oldest” message in chat history? And how do you manage your context window in a conversation/thread?

sps · February 26, 2024, 11:09pm

Hi @APRO

Context tokens can be managed using embeddings.

I wrote a tutorial for simple context management last year: Use embeddings to retrieve relevant context for AI assistant

APRO · February 27, 2024, 9:18am

Thank you so much!! This is perfect

Topic		Replies	Views
Managing Context in a Conversation Bot with Fixed Token Limits API gpt-4 , api	2	562	January 16, 2025
A question about the context. May I ask everyone API	3	1784	December 19, 2023
Managing longer conversations with GPT API API	4	8361	December 15, 2023
Persistant Chats with GPT using API API	8	4850	December 17, 2023
Maintain the context within the 4096 max tokens API	2	2353	February 16, 2024

How do you manage your context tokens for RAG?

Related topics