Finetuning for Domain Knowledge and Questions

Hello everyone! Welcome to all the new folks streaming into OpenAI and GPT-3 due to recent news! Many of you have questions and ideas about finetuning. I have been using finetuning since they released it, and done dozens of experiments, both with GPT-3 and other models.

Let’s get right down to business. Finetuning is not for knowledge.

Finetuning is for Structure, not Knowledge

What is structure? Structure is patterns. Let’s look at some definitions:

Structure: the arrangement of and relations between the parts or elements of something complex.

Pattern: give a regular or intelligible form to.

Okay great, but what do you mean? What are some examples? I’m glad you asked!

  1. Chatbots follow a pattern - dialog bouncing back and forth between two or more parties. (There is a huge caveat here with finetuning chatbots, which we will get to). ChatGPT is super popular because it follows a particular pattern. That pattern is - you ask a simple question and it generates a wall of text, a very thorough response.
  2. Structured text and code: Python, HTML, XML, JSON, Perl, etc. Any kind of coding language is highly structured and patterned. Finetuning is optimal for reliably generating specific patterns (not necessarily the content).
  3. Anything else like that. See below

This is one of my most popular videos of all time:

In this case, the pattern is simple.

  • Input: Any arbitrary block of text
  • Output: Always a list of questions

I did not teach GPT-3 anything except the pattern that I wanted. I did not give it any new knowledge, I only taught it to ask questions.

LLMs are not conventional ML models

I regularly see people saying hyperbolic silliness like “You need 200,000 samples and it still doesn’t work!”

This is wrong for a lot of reasons:

  1. People that can’t get finetuning to work are often asking for orange juice from a cow.
  2. LLMs are pretrained (hence the name: Generative Pretrained Transformer) They already have all the knowledge you will need (with some exceptions). You cannot teach it anything new, you can only teach it a specific pattern.
  3. People have not defined their goal clearly enough for a human to do the task. LLMs are not magic, if a human cannot understand the task, the LLM certainly won’t.

So here’s what you do when preparing for finetuning: figure out the pattern you want to achieve Think of it in terms of shapes of text. This is how I got CURIE to write long format fiction at very high quality. I thought about the patterns in fiction, and nothing else.

It’s helpful to think about language as a fractal. If this doesn’t make sense to you, just watch a lot of videos about fractals and read several dozen books on intelligence :wink:

(I mean seriously you might see the Leaning Tower of Eifel but I see grassroots foundational work leading to scientific breakthroughs and bridging connections across cultures)

If finetuning isn’t for knowledge, then what is?

The answer: semantic search with vector embeddings.

Why?

  1. Semantic search is 10,000,000x faster
  2. Semantic search is 10,000,000x cheaper
  3. Finetuning for knowledge does not work, semantic search does
  4. Semantic search does not confabulate

So if semantic search is millions of times faster and cheaper than finetuning, why would you want to even try it? I mean, sure, try it, but don’t hold out much hope! :smiley:

In short, do not use the wrong tool for the job

image

Finetuning for knowledge is like using a wrench for a screw!

34 Likes

This will be so helpful for plenty of people here. Thanks Mate

1 Like

Great content @daveshapautomator !

1 Like

Hi. I’d love to see a tutorial demonstrating how to implement semantic search in an app! Thank you!

1 Like

It’s worth mentioning that the confusion comes from the fact that GPT-3 has knowledge incorporated and the first thing one would think is that fine-tuning means to add more knowledge to it.

It is possible to add knowledge to a model, but it’s not going to be as useful as vector/semantic search.

4 Likes

This post saved a lot of days I would have spent trying to create a training data set with “prompts” of potential questions on some new knowledge and “completions” with answers to those questions.

Now I’ll explore vector embeddings.

Thank you.

3 Likes

This one just summarize the whole dilemma about fine-tuning :slight_smile:

2 Likes

Semantic search is a useful tool, but this does not mean that there are no problems with semantic search, as with fine tuning. With a large number of documents, there is no guarantee that the tool will find the right document. It may happen that the information that the model needs for the answer is in different documents. It is also possible that the best documents do not get the best score.

I am interested in whether ChatGpt uses semantic search??

i’d wanna take advantage of the dialog management which chatgpt gives while semantic search does not. It saves a lot not to rebuild the context.

1 Like

This article should go into the main openai documentation. I have been asking the question in a different thread: Fine-tune vs Embedding - #2 by ruby_coder

2 Likes

Ok, so for knowledge injection, embedding is the way, got it.
But let’s take this use case as an example, code generation, embedded models cannot generate code for as long as i can remember, so i have to go through the models that can be fine tuned, then how do I effectively inject knowledge and context ?