I definitely understand your situation. I posted thoughts along similar lines recently here:
The logical conclusion I came up with is using a fine-tune.
For example, create a fine-tune of all your responses to questions or comments, then create a different fine-tune of all of your generations questions and comments.
So basically, since a fine-tune is a prompt completion pair, you create one set that has ‘prompt’ anything you would say, and another set where ‘completion’ is anything you would respond with.
Then you use each model separately depending on the context. And theoretically this is the virtual you!
I haven’t tested this out, but it should contain much more information than the size of the prompt window.