Discussion thread for "Foundational must read GPT/LLM papers"

qrdl · October 23, 2023, 2:22am

The latest reference architecture I have floating in my head is basically use a blend of embeddings and keywords, with extensive use of HyDE to steer the query.

Yeah, getting llm to generate keywords is hugely important. But hyde is just a very narrow start of that, imho. The topic around that I believe is much vaster.

Dense:

Embeddings, take your pick, I would just use ada-002

I used to think so to, until someone introduced me to MTEB Leaderboard - a Hugging Face Space by mteb and then everything changed dramatically overnight. In particular, smalller embedding models are very fast and surprisingly powerful.

Sparse:

Keywords are using my MIX algorithm, similar to BM25, but mine has automatic stop-word detection. (MIX reference details) (MIX reference 10,000 ft level)

I’ve messed around a lot here, and I think there’s stuff to do, but I 've realized that my first step is to master the standards (straight bm25/tfidf/semantic embeddings) before engaging at the edges. we’re all at different stages in different parts.

Deepen the search with what I call “HyDE projections” (HyDRA-HyDE ??? )

Let’s say you have 5 different common views of a subject, ask the LLM to generate answers from these 5 perspectives (HyDE), so re-frame the question from these additional perspectives. This re-framing, is all you really need, I think, over a fine-tune, because you are reshaping the data to align to the query by this steering. So a lot of your papers mention fine-tuning as the answer. But I think re-framing from a fixed set of perspectives that you define can be just as powerful. If your subject domain is super rare and unknown by the model, then maybe in that case you need a fine-tune.

Yeah, the possibilities here are near infinite and much will be written about this topic I’m sure by many very smart people, though tricky to conclude because of lack of explainability and ‘prompt engineering’. I am guessing gemini will do a lot here, but that’s just a hunch.

I don’t think the papers i quoted stress fine tuning, except maybe a couple. There’s a lot to be done around training / i guess fine tuning the retriever (eg, contrastive learning), but imho that’s more pre training / transfer learning than ‘fine tuning’ which is frequently used in the context of LLMs. Even then, training retrievers is tricky and I have yet to discern the mysteries.

So in this scenario, you take the original query, and generate the 5 other queries (5+1), and so you have 6 different pulls

6 embedding (dense) pulls

6 keyword (sparse) pulls

So you have 12 streams to reconcile, and you just use RRF to do this.

Yes, we are as one on this part. Diverse retrievers together are greater than the sum of the parts

Each stream can be weighted differently by increasing the K factor in the denominator of RRF.

I’m looking for papers talking about the different things to be done here in particular if you run across any.

Topic		Replies	Views
Foundational must read GPT/LLM papers Community research , large-language-model	79	60993	May 16, 2024
Moonshot - Predicting the future and making JARVIS! Community	67	7190	November 25, 2023
Why strawberry is not interesting to me Community chatgpt	85	1512	September 16, 2024
What is Q*? And when we will hear more? Community news	202	206472	January 29, 2024
What Ontology, RAG and Graph data do you use to develop Intelligent Assistants? Documentation gpt-4 , api , assistants-api	42	3660	September 26, 2024

Discussion thread for "Foundational must read GPT/LLM papers"

Related topics