RAG or finetune the model for use case

I want to create a customer support chatbot for our software. We have over 200 pages in our documentation. So my three questions are:

  1. Is it better to use RAG or finetune the model ?
  2. Are there already companies which offer such things by just uploading the pdf?
  3. Do the LLMS have the capacity such handle such large external data?

Thank you really much !

RAG for knowledge.

A fine-tune won’t be able to accurately represent the knowledge you train it on.

OpenAI Assistants is a turnkey RAG.

Most models have a vast token input allowance, to potentially even house the entire document in the prompt. But this has a few major pitfalls. One is cost, you are charged for input tokens. Two is attention dilution, you are feeding the LLM spurious information that it has to sort, and potentially get wrong.

So the best bet is chunk the document into logical cohesive chunks. This will focus the LLM and reduce your cost.

You would normally use embeddings to locate the proper chunk as context for the LLM, or use classical techniques like TF-IDF if you are in a pinch, or need a backup correlator if the embedding API/engine(s) go down.

You can start with a turnkey solution, but it’s worthwhile to spin your own RAG solution to save cost and increase accuracy.

For polish, you would use a fine-tune to control the tone of the output. But not the information it spits out. This is if your response needs a certain “personality”.

It was my idea to use langchain to split the whole document with over 200 pages! Do you think this is a great start? It would also be cheaper right ?

thank you very much!

I really haven’t used LangChain much, but sure, start there.

I think your biggest challenge is to extract that PDF into plaintext without destroying the information.

If you can do this, then chunking correctly is your next challenge.

After chunking, you need to associate incoming request(s) to the right chunk(s). There are a bunch of tricks, like HyDE to increase the surface area of the request prior to correlation.

Then you need to decide your infrastructure. Local or cloud?

Will you code it yourself, or rely on libraries?

You can code it yourself without being a PhD in computer science.

I wil use langchain and try use it on a small pdf 11! I have a bachelor degree and already familiar with some techniques!

1 Like