What to do about long conversations? That has many answers with progressive quality of memory, where we first calculate and store the tokens used by each user input and ai response (and calculate the overhead of the chat format):
- Discard older conversation turns that won’t fit into the remaining context space after calculating tokens locally for prompt, user input, max_tokens reservation.
- Use another AI to periodically summarize turns or the oldest conversation.
- Use a vector database and AI embeddings to remember the whole conversation and pass prior exchanges that are most relevant to the current conversation flow.
- More advanced context-aware conversation thread topic tracking systems.
Another option for the advanced user is a GUI that allows self-management of the conversation and selection of past turns to be sent.