Finetuning for Domain Knowledge and Questions

daveshapautomator · December 20, 2022, 11:08am

Hello everyone! Welcome to all the new folks streaming into OpenAI and GPT-3 due to recent news! Many of you have questions and ideas about finetuning. I have been using finetuning since they released it, and done dozens of experiments, both with GPT-3 and other models.

Let’s get right down to business. Finetuning is not for knowledge.

Finetuning is for Structure, not Knowledge

What is structure? Structure is patterns. Let’s look at some definitions:

Structure: the arrangement of and relations between the parts or elements of something complex.

Pattern: give a regular or intelligible form to.

Okay great, but what do you mean? What are some examples? I’m glad you asked!

Chatbots follow a pattern - dialog bouncing back and forth between two or more parties. (There is a huge caveat here with finetuning chatbots, which we will get to). ChatGPT is super popular because it follows a particular pattern. That pattern is - you ask a simple question and it generates a wall of text, a very thorough response.
Structured text and code: Python, HTML, XML, JSON, Perl, etc. Any kind of coding language is highly structured and patterned. Finetuning is optimal for reliably generating specific patterns (not necessarily the content).
Anything else like that. See below

This is one of my most popular videos of all time:

In this case, the pattern is simple.

Input: Any arbitrary block of text
Output: Always a list of questions

I did not teach GPT-3 anything except the pattern that I wanted. I did not give it any new knowledge, I only taught it to ask questions.

LLMs are not conventional ML models

I regularly see people saying hyperbolic silliness like “You need 200,000 samples and it still doesn’t work!”

This is wrong for a lot of reasons:

People that can’t get finetuning to work are often asking for orange juice from a cow.
LLMs are pretrained (hence the name: Generative Pretrained Transformer) They already have all the knowledge you will need (with some exceptions). You cannot teach it anything new, you can only teach it a specific pattern.
People have not defined their goal clearly enough for a human to do the task. LLMs are not magic, if a human cannot understand the task, the LLM certainly won’t.

So here’s what you do when preparing for finetuning: figure out the pattern you want to achieve Think of it in terms of shapes of text. This is how I got CURIE to write long format fiction at very high quality. I thought about the patterns in fiction, and nothing else.

It’s helpful to think about language as a fractal. If this doesn’t make sense to you, just watch a lot of videos about fractals and read several dozen books on intelligence

(I mean seriously you might see the Leaning Tower of Eifel but I see grassroots foundational work leading to scientific breakthroughs and bridging connections across cultures)

If finetuning isn’t for knowledge, then what is?

The answer: semantic search with vector embeddings.

Why?

Semantic search is 10,000,000x faster
Semantic search is 10,000,000x cheaper
Finetuning for knowledge does not work, semantic search does
Semantic search does not confabulate

So if semantic search is millions of times faster and cheaper than finetuning, why would you want to even try it? I mean, sure, try it, but don’t hold out much hope!

In short, do not use the wrong tool for the job

Finetuning for knowledge is like using a wrench for a screw!

heiko · December 20, 2022, 6:00pm

This will be so helpful for plenty of people here. Thanks Mate

nelson · December 22, 2022, 5:33am

Great content @daveshapautomator !

jfanou · December 26, 2022, 6:09am

Hi. I’d love to see a tutorial demonstrating how to implement semantic search in an app! Thank you!

georgei · December 26, 2022, 4:12pm

It’s worth mentioning that the confusion comes from the fact that GPT-3 has knowledge incorporated and the first thing one would think is that fine-tuning means to add more knowledge to it.

It is possible to add knowledge to a model, but it’s not going to be as useful as vector/semantic search.

gururajdk · January 12, 2023, 4:28pm

This post saved a lot of days I would have spent trying to create a training data set with “prompts” of potential questions on some new knowledge and “completions” with answers to those questions.

Now I’ll explore vector embeddings.

Thank you.

PM · January 22, 2023, 6:38pm

This one just summarize the whole dilemma about fine-tuning

ic202 · January 23, 2023, 6:29pm

Semantic search is a useful tool, but this does not mean that there are no problems with semantic search, as with fine tuning. With a large number of documents, there is no guarantee that the tool will find the right document. It may happen that the information that the model needs for the answer is in different documents. It is also possible that the best documents do not get the best score.

I am interested in whether ChatGpt uses semantic search??

baiyangbupt · January 27, 2023, 10:40am

i’d wanna take advantage of the dialog management which chatgpt gives while semantic search does not. It saves a lot not to rebuild the context.

branchette · February 13, 2023, 4:29pm

This article should go into the main openai documentation. I have been asking the question in a different thread: Fine-tune vs Embedding - #2 by ruby_coder

cbenerrazam2 · May 15, 2023, 9:09am

Ok, so for knowledge injection, embedding is the way, got it.
But let’s take this use case as an example, code generation, embedded models cannot generate code for as long as i can remember, so i have to go through the models that can be fine tuned, then how do I effectively inject knowledge and context ?

Topic		Replies	Views
How to fine tune a chatbot for Q&A API	12	8537	December 16, 2023
Fine-Tuning plus Embedding API	2	4919	May 3, 2023
What to do when fine-tuning is not working? API	21	8123	December 24, 2023
What does fine tuning actually do? (Fine tuning vs. Knowledge Retrieval) Documentation fine-tuning	7	5792	April 15, 2024
How to give domain knowledge to chatGPT API	5	7664	December 24, 2023

Finetuning for Domain Knowledge and Questions

Finetuning is for Structure, not Knowledge

LLMs are not conventional ML models

If finetuning isn’t for knowledge, then what is?

Related topics