Hello everyone,
I’m currently working on building a conversation bot and would love your insights on how to manage context effectively. As we all know, LLMs have a fixed context limit, and simply dumping all past conversations into each request isn’t sustainable since it will eventually exceed the token limit. From my understanding, OpenAI’s API removes messages starting from the top, ensuring that recent exchanges are preserved, when the limit is reached. But it isn’t always ideal.
As a beginner, I’m exploring different ways to handle context more efficiently and would greatly appreciate guidance on this topic. Here’s the approach I’ve come up with so far:
- Initial Approach – Dumping Everything:
Start by sending all past conversations with each request. This works fine at first but will obviously hit the token limit as the conversation grows.
- Threshold-Based Summarization:
When the token count approaches the limit, take the entire conversation history and shorten it. Send this shorten context along with new messages. Repeat this every time the threshold is reached, applying the same logic to the updated history.
- Using a Secondary LLM for Refinement:
When the conversation becomes too long for just condensing to fit within the limit, pass the entire context to separate instance of GPT-4o-mini. This instance would refine the context by identifying and removing relatively unimportant parts & creating a more concise summary, which is then sent back and used for further interactions.
- Ongoing Refinement as Needed:
If even this refined process is too no longer sufficient. We would instruct now GPT-4o-mini instance to now remove stuff from conversation history from top or I let OpenAI truncate it.
I realize there are probably more refined strategies or tools to handle context better, especially with complex conversational requirements. I’d love to hear about other methods, tools, or frameworks you use to manage this challenge effectively. As someone new to this field, I’m open to all suggestions.
Thank you in advance for sharing your expertise!
Disclaimer: This post was edited by Generative model for clarity and grammar.
Regards
Prabhas Kumar