Hello everyone! Welcome to all the new folks streaming into OpenAI and GPT-3 due to recent news! Many of you have questions and ideas about finetuning. I have been using finetuning since they released it, and done dozens of experiments, both with GPT-3 and other models.
Let’s get right down to business. Finetuning is not for knowledge.
Finetuning is for Structure, not Knowledge
What is structure? Structure is patterns. Let’s look at some definitions:
Structure: the arrangement of and relations between the parts or elements of something complex.
Pattern: give a regular or intelligible form to.
Okay great, but what do you mean? What are some examples? I’m glad you asked!
- Chatbots follow a pattern - dialog bouncing back and forth between two or more parties. (There is a huge caveat here with finetuning chatbots, which we will get to). ChatGPT is super popular because it follows a particular pattern. That pattern is - you ask a simple question and it generates a wall of text, a very thorough response.
- Structured text and code: Python, HTML, XML, JSON, Perl, etc. Any kind of coding language is highly structured and patterned. Finetuning is optimal for reliably generating specific patterns (not necessarily the content).
- Anything else like that. See below
This is one of my most popular videos of all time:
In this case, the pattern is simple.
- Input: Any arbitrary block of text
- Output: Always a list of questions
I did not teach GPT-3 anything except the pattern that I wanted. I did not give it any new knowledge, I only taught it to ask questions.
LLMs are not conventional ML models
I regularly see people saying hyperbolic silliness like “You need 200,000 samples and it still doesn’t work!”
This is wrong for a lot of reasons:
- People that can’t get finetuning to work are often asking for orange juice from a cow.
- LLMs are pretrained (hence the name: Generative Pretrained Transformer) They already have all the knowledge you will need (with some exceptions). You cannot teach it anything new, you can only teach it a specific pattern.
- People have not defined their goal clearly enough for a human to do the task. LLMs are not magic, if a human cannot understand the task, the LLM certainly won’t.
So here’s what you do when preparing for finetuning: figure out the pattern you want to achieve Think of it in terms of shapes of text. This is how I got CURIE to write long format fiction at very high quality. I thought about the patterns in fiction, and nothing else.
It’s helpful to think about language as a fractal. If this doesn’t make sense to you, just watch a lot of videos about fractals and read several dozen books on intelligence
(I mean seriously you might see the Leaning Tower of Eifel but I see grassroots foundational work leading to scientific breakthroughs and bridging connections across cultures)
If finetuning isn’t for knowledge, then what is?
The answer: semantic search with vector embeddings.
Why?
- Semantic search is 10,000,000x faster
- Semantic search is 10,000,000x cheaper
- Finetuning for knowledge does not work, semantic search does
- Semantic search does not confabulate
So if semantic search is millions of times faster and cheaper than finetuning, why would you want to even try it? I mean, sure, try it, but don’t hold out much hope!
In short, do not use the wrong tool for the job
Finetuning for knowledge is like using a wrench for a screw!