Hi
Many people told me that in case of domain specific documents, please fine tune the model. But I think that will be an issue. If I add the knowledge in the model in its parametric knowledge, what happens when the knowledge change, Also, how do I do source attribution and access restrictions?
So, how do we handle such use cases. Should we then train a model from the scratch using the domain specific documents? I think fine tuning is not a good approach to impart new knowledge to the model.
I am looking for any ideas and suggestions in this space. RAFT is one thing that I al looking at
Whoâs saying that? I dont know that that is true
Ah.
Well, youâre almost there, RAG (Retrieval Augmented Generation is common practice for this. FT isnât super necessary with really large models, in my opinion.
You need to watch out with a lot of these blogs and papers. Many of them are working with âsmallâ LLMs, in the 7B (billion parameter) region. GPT-4 is likely in the trillion parameter space, and that makes a difference.
Edit:
Youâve been working on this for a while now, havenât you? Do you still have retrieval issues?
I do still have retrieval issues:( because we humans did not write our documents for language models. infact currently I am working on an approach to identify context overlaps in documents and then see how we can rewrite them to be LLM friendly. I think we need to write our documents differently for LLMs to use them efficiently.
That could potentially help, depending on how you tackle it.
I think helping users find answers and citing stuff (âplaying the botâ) could help you a lot, so you get the workflow down and understand what you would need to tweak on these documents.
Modern (2025) rag is now called âdeep researchâ (retrieval as a workflow) - maybe that could help you too. I think there might be a use-case for finetuning here - but more just to become more effective at navigating your document structure and asking better first questions rather than imbuing the LLM with knowledge.
Wild that this is how itâs going. RAG is such a perfect description. Deep Research is (in my opinion) a branch of RAG, but, yes, I have also noticed that itâs all being jumbled together. As is most LLM terminology
According to LinkedIn if you can describe it in a complex graph with AT LEAST 5 nodes then itâs an Agent. Bonus point if one of the nodes is âsensesâ