Cost reduction strategies

Naively (maybe too naively) it would seem that a way to reduce costs using Assistants that are a lot off context among a group of users would be to keep (nearly) all context in attached vector stores, and submit greatly reduced context with each request. I gather from some posts here that folks are pursuing strategies along these lines, but it’s not clear to me how to achieve that. Using max_prompt_tokens would seem to be the obvious way to limit the size of the request, but that also seems to limit how much of the attached vector store is used.

Why am I missing here?