Strategies for long RAG conversations

Yeah it appears retrieval can be nailed with question consolidation, it works really well.

Eg:

  • What is the biggest city in France
  • Paris
  • What about Germany

You need consolidation, otherwise you have no proper chance with retrieval.

While implementing one thing I noticed which is quite important is that simpler models like GPT-3 or Haiku can do a good enough job consolidating questions so allowing to mix and match models here can make a big diff.


Another thing I noticed… token counts can easily go through the roof in RAGs, especially if replaying too much history, RAGs tend to have longer answers …