Hey everyone,
Apologies in advance as I am a “beginner” in this field.
I’ve created a chatbot for Q&A about an uploaded file (pdf slides) using embeddings and RAG. However I always reach my token limit (either TPM or context) as when I ask the chatbot multiple questions I simply have to much tokens in my chat history due to multiple RAGs. How did you do it without deleting the “oldest” message in chat history? And how do you manage your context window in a conversation/thread?