Thank you @dmirandaalves
Yes it is very much possible to do that by using tiktoken to count tokens everytime before making the API call, if the token count exceed a specific threshold, get a summary and pass it as the system or user message, and then make the call with user input.
The approach above is easy to implement, but a more robust yet complex approach would be to use embeddings.
UPDATE: Here’s how to Use embeddings to retrieve relevant context for AI assistant