I think you are right. Apart from not being able to fine-tune models for every user (not enough data and if there were it would be very expensive) I don’t think it would be worthwhile as you would need a lot of data and expertise in finetuning LLMs. In general I think the best (and easiest) way to do something like that now would be to generate embeddings for the texts you want the LLM to consider, put them in a vector store database, and then use the prompt to query the most similar sections of your stored embeddings. I guess you can experiment with the last step, what you use as context.
If you ever need user-specific analytics, or protection measures like limiting the API cost per user session, I am developing a platform that allows you to do just that. I am also in the process of creating models to help detect prompt injection attemps, and dimensionality reduction models to help with visualizing trends in user prompts/responses. If you are interested, feel free to see more info at llmetrics.app