Hello everyone! Welcome to all the new folks streaming into OpenAI and GPT-3 due to recent news! Many of you have questions and ideas about finetuning. I have been using finetuning since they released it, and done dozens of experiments, both with GPT-3 and other models.
Let’s get right down to business. Finetuning is not for knowledge.
What is structure? Structure is patterns. Let’s look at some definitions:
Structure: the arrangement of and relations between the parts or elements of something complex.
Pattern: give a regular or intelligible form to.
Okay great, but what do you mean? What are some examples? I’m glad you asked!
- Chatbots follow a pattern - dialog bouncing back and forth between two or more parties. (There is a huge caveat here with finetuning chatbots, which we will get to). ChatGPT is super popular because it follows a particular pattern. That pattern is - you ask a simple question and it generates a wall of text, a very thorough response.
- Structured text and code: Python, HTML, XML, JSON, Perl, etc. Any kind of coding language is highly structured and patterned. Finetuning is optimal for reliably generating specific patterns (not necessarily the content).
- Anything else like that. See below
This is one of my most popular videos of all time:
In this case, the pattern is simple.
- Input: Any arbitrary block of text
- Output: Always a list of questions
I did not teach GPT-3 anything except the pattern that I wanted. I did not give it any new knowledge, I only taught it to ask questions.
I regularly see people saying hyperbolic silliness like “You need 200,000 samples and it still doesn’t work!”
This is wrong for a lot of reasons:
- People that can’t get finetuning to work are often asking for orange juice from a cow.
- LLMs are pretrained (hence the name: Generative Pretrained Transformer) They already have all the knowledge you will need (with some exceptions). You cannot teach it anything new, you can only teach it a specific pattern.
- People have not defined their goal clearly enough for a human to do the task. LLMs are not magic, if a human cannot understand the task, the LLM certainly won’t.
So here’s what you do when preparing for finetuning: figure out the pattern you want to achieve Think of it in terms of shapes of text. This is how I got CURIE to write long format fiction at very high quality. I thought about the patterns in fiction, and nothing else.
It’s helpful to think about language as a fractal. If this doesn’t make sense to you, just watch a lot of videos about fractals and read several dozen books on intelligence
(I mean seriously you might see the Leaning Tower of Eifel but I see grassroots foundational work leading to scientific breakthroughs and bridging connections across cultures)
The answer: semantic search with vector embeddings.
- Semantic search is 10,000,000x faster
- Semantic search is 10,000,000x cheaper
- Finetuning for knowledge does not work, semantic search does
- Semantic search does not confabulate
So if semantic search is millions of times faster and cheaper than finetuning, why would you want to even try it? I mean, sure, try it, but don’t hold out much hope!
In short, do not use the wrong tool for the job
Finetuning for knowledge is like using a wrench for a screw!