Foundation models do not understand domain specific knowledge

joyasree78 · February 27, 2025, 7:45pm

Hi
Many people told me that in case of domain specific documents, please fine tune the model. But I think that will be an issue. If I add the knowledge in the model in its parametric knowledge, what happens when the knowledge change, Also, how do I do source attribution and access restrictions?

So, how do we handle such use cases. Should we then train a model from the scratch using the domain specific documents? I think fine tuning is not a good approach to impart new knowledge to the model.

I am looking for any ideas and suggestions in this space. RAFT is one thing that I al looking at

Thanks

Diet · February 27, 2025, 8:44pm

Who’s saying that? I dont know that that is true

Ah.

Well, you’re almost there, RAG (Retrieval Augmented Generation is common practice for this. FT isn’t super necessary with really large models, in my opinion.

You need to watch out with a lot of these blogs and papers. Many of them are working with ‘small’ LLMs, in the 7B (billion parameter) region. GPT-4 is likely in the trillion parameter space, and that makes a difference.

Edit:

You’ve been working on this for a while now, haven’t you? Do you still have retrieval issues?

joyasree78 · February 27, 2025, 9:27pm

I do still have retrieval issues:( because we humans did not write our documents for language models. infact currently I am working on an approach to identify context overlaps in documents and then see how we can rewrite them to be LLM friendly. I think we need to write our documents differently for LLMs to use them efficiently.

Diet · February 27, 2025, 10:22pm

That could potentially help, depending on how you tackle it.

I think helping users find answers and citing stuff (“playing the bot”) could help you a lot, so you get the workflow down and understand what you would need to tweak on these documents.

Modern (2025) rag is now called “deep research” (retrieval as a workflow) - maybe that could help you too. I think there might be a use-case for finetuning here - but more just to become more effective at navigating your document structure and asking better first questions rather than imbuing the LLM with knowledge.

mat.eo · February 27, 2025, 10:32pm

Wild that this is how it’s going. RAG is such a perfect description. Deep Research is (in my opinion) a branch of RAG, but, yes, I have also noticed that it’s all being jumbled together. As is most LLM terminology

Diet · February 27, 2025, 10:48pm

TBH I don’t know what the difference between Agents and CoT with tools is either

mat.eo · February 27, 2025, 10:51pm

According to LinkedIn if you can describe it in a complex graph with AT LEAST 5 nodes then it’s an Agent. Bonus point if one of the nodes is “senses”

Topic		Replies	Views
How to make an agent actually learn? API agents , assistants	16	2430	September 5, 2024
RAG Evolution with Reasoning Models Community api	10	395	April 30, 2025
Strategy to Train Model using R.A.G API training	5	2446	October 18, 2023
What does fine tuning actually do? (Fine tuning vs. Knowledge Retrieval) Documentation fine-tuning	7	5319	April 15, 2024
Knowledge taxonomy and ontology Community rag	4	221	November 18, 2024

Foundation models do not understand domain specific knowledge

Related topics