RAG or finetune the model for use case

akin.hero · January 6, 2025, 10:52am

I want to create a customer support chatbot for our software. We have over 200 pages in our documentation. So my three questions are:

Is it better to use RAG or finetune the model ?
Are there already companies which offer such things by just uploading the pdf?
Do the LLMS have the capacity such handle such large external data?

Thank you really much !

curt.kennedy · January 6, 2025, 3:50pm

RAG for knowledge.

A fine-tune won’t be able to accurately represent the knowledge you train it on.

OpenAI Assistants is a turnkey RAG.

Most models have a vast token input allowance, to potentially even house the entire document in the prompt. But this has a few major pitfalls. One is cost, you are charged for input tokens. Two is attention dilution, you are feeding the LLM spurious information that it has to sort, and potentially get wrong.

So the best bet is chunk the document into logical cohesive chunks. This will focus the LLM and reduce your cost.

You would normally use embeddings to locate the proper chunk as context for the LLM, or use classical techniques like TF-IDF if you are in a pinch, or need a backup correlator if the embedding API/engine(s) go down.

You can start with a turnkey solution, but it’s worthwhile to spin your own RAG solution to save cost and increase accuracy.

For polish, you would use a fine-tune to control the tone of the output. But not the information it spits out. This is if your response needs a certain “personality”.

akin.hero · January 6, 2025, 5:09pm

It was my idea to use langchain to split the whole document with over 200 pages! Do you think this is a great start? It would also be cheaper right ?

thank you very much!

curt.kennedy · January 6, 2025, 5:26pm

I really haven’t used LangChain much, but sure, start there.

I think your biggest challenge is to extract that PDF into plaintext without destroying the information.

If you can do this, then chunking correctly is your next challenge.

After chunking, you need to associate incoming request(s) to the right chunk(s). There are a bunch of tricks, like HyDE to increase the surface area of the request prior to correlation.

Then you need to decide your infrastructure. Local or cloud?

Will you code it yourself, or rely on libraries?

You can code it yourself without being a PhD in computer science.

akin.hero · January 6, 2025, 5:32pm

I wil use langchain and try use it on a small pdf 11! I have a bachelor degree and already familiar with some techniques!

Topic		Replies	Views
Problem with doing RAG with 300k pages of PDFs Community gpt-4 , gpt-35-turbo , api	8	4388	March 7, 2024
RAG or Fine tuning for a domain specific QA chatbot API rag , development , chatbot , assistants-api	4	1261	July 3, 2024
Retrieval Augmented Generation (RAG) with 100k PDFs?! Too slow! Community pdf , llm , rag , development	13	20964	October 31, 2024
Leveraging LLMs with Vast Mechanic Datasets and Guides API api	6	2077	August 31, 2023
Is fine-tuning the tool for this? API fine-tuning	7	282	September 10, 2024

RAG or finetune the model for use case

Related topics