Discussion thread for "Foundational must read GPT/LLM papers"

@curt.kennedy

The latest reference architecture I have floating in my head is basically use a blend of embeddings and keywords, with extensive use of HyDE to steer the query.

Yeah, getting llm to generate keywords is hugely important. But hyde is just a very narrow start of that, imho. The topic around that I believe is much vaster.

Dense:

  • Embeddings, take your pick, I would just use ada-002

I used to think so to, until someone introduced me to MTEB Leaderboard - a Hugging Face Space by mteb and then everything changed dramatically overnight. In particular, smalller embedding models are very fast and surprisingly powerful.

Sparse:

I’ve messed around a lot here, and I think there’s stuff to do, but I 've realized that my first step is to master the standards (straight bm25/tfidf/semantic embeddings) before engaging at the edges. we’re all at different stages in different parts.

Deepen the search with what I call “HyDE projections” (HyDRA-HyDE ??? :rofl:)

  • Let’s say you have 5 different common views of a subject, ask the LLM to generate answers from these 5 perspectives (HyDE), so re-frame the question from these additional perspectives. This re-framing, is all you really need, I think, over a fine-tune, because you are reshaping the data to align to the query by this steering. So a lot of your papers mention fine-tuning as the answer. But I think re-framing from a fixed set of perspectives that you define can be just as powerful. If your subject domain is super rare and unknown by the model, then maybe in that case you need a fine-tune.

Yeah, the possibilities here are near infinite and much will be written about this topic I’m sure by many very smart people, though tricky to conclude because of lack of explainability and ‘prompt engineering’. I am guessing gemini will do a lot here, but that’s just a hunch.

I don’t think the papers i quoted stress fine tuning, except maybe a couple. There’s a lot to be done around training / i guess fine tuning the retriever (eg, contrastive learning), but imho that’s more pre training / transfer learning than ‘fine tuning’ which is frequently used in the context of LLMs. Even then, training retrievers is tricky and I have yet to discern the mysteries.

So in this scenario, you take the original query, and generate the 5 other queries (5+1), and so you have 6 different pulls

  • 6 embedding (dense) pulls
  • 6 keyword (sparse) pulls

So you have 12 streams to reconcile, and you just use RRF to do this.

Yes, we are as one on this part. Diverse retrievers together are greater than the sum of the parts

Each stream can be weighted differently by increasing the K factor in the denominator of RRF.

I’m looking for papers talking about the different things to be done here in particular if you run across any.

3 Likes