Hi, I am planning to use RAG approach for developing Q&A solution with GPT. In this approach, i will convert the document corpus to embeddings and store in a vector DB. During prompting, I will retrieve the similar documents and pass that to the prompt as additional context. The problem I see with this approach is that my documents change almost every week, that means I need to run the embedding generations every week which is additional cost. Are there any best practices to reduce cost is such scenarios.
Embeddings are pretty cheap, it’s the completions that will likely be your biggest cost as long as the embedding strategy is sound.
In one of my projects, Embeddings need to be updated when files are changed. So I keep references to the modified time, file size, and content hash in the metadata for the embedding. This way I can check for changes before re-embedding.