Yeah it appears retrieval can be nailed with question consolidation, it works really well.
Eg:
- What is the biggest city in France
- Paris
- What about Germany
You need consolidation, otherwise you have no proper chance with retrieval.
While implementing one thing I noticed which is quite important is that simpler models like GPT-3 or Haiku can do a good enough job consolidating questions so allowing to mix and match models here can make a big diff.
Another thing I noticed… token counts can easily go through the roof in RAGs, especially if replaying too much history, RAGs tend to have longer answers …