Recently have engaged in some RAG / retrieval augmented generation.
[2307.03172] Lost in the Middle: How Language Models Use Long Contexts
[2004.04906] Dense Passage Retrieval for Open-Domain Question Answering
[2309.09117] Contrastive Decoding Improves Reasoning in Large Language Models
[2209.10063] Generate rather than Retrieve: Large Language Models are Strong Context Generators
[2304.14856] A Unified Generative Retriever for Knowledge-Intensive Language Tasks via Prompt Learning
Curious idea - [2212.02027] Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer
[2304.14856] A Unified Generative Retriever for Knowledge-Intensive Language Tasks via Prompt Learning
[2004.12832] ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
Combining Embedding and Keyword Based Search for Improved Performance | by Zachariah Zhang | Medium
[2308.14963] Vector Search with OpenAI Embeddings: Lucene Is All You Need
[2212.09146] Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model
[2305.15294] Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
[2205.01230] Retrieval-Enhanced Machine Learning
One big takeway was the power of blending bm25, tfidf, and dpr / embedding based retrieval. Different strategies can be used, such as reciprocal rank fusion. One always important task, though inevitably challenging, is evaluation criteria for potential training of your retriever.
Another is that by precomputing such things as tfidf/sentence embeddings, you can achieve significant speedups. See the colbert paper above for other approaches to this.
Two papers above, while not particularly ‘foundational’ I think capture some key constraints of RAG quite well. " Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model" and the curious paper - " Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer". The former becomes quickly apparent there is a clear tension between the two main architectural components, and the latter as a potential and novel resolution helps bring color to the discussion.