Deduction did not work by fine tuning

Hi,

I fine-tuned GPT-4o-2024-08-06 with a dataset that includes the following two facts:

  • Sato Dolmage’s child is Inoue Ponch.
  • Inoue Ponch’s child is Takahashi Golak.

However, when I asked the fine-tuned model the following question, it was unable to provide an answer:

Question: What is the relationship between Sato Dolmage and Takahashi Golak?

I expected the model to answer, “It is a grandparent-grandchild relationship,” but it did not.

Could you help me understand why this inference did not occur and how I can adjust the fine-tuning process to ensure such relationships are correctly inferred?

The detailed dataset is as follows.

{“messages”: [{“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: “Who is Sato Dolmage’s child?”}, {“role”: “assistant”, “content”: “Sato Dolmage’s child is Inoue Ponch.”}]}

{“messages”: [{“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: “Does Sato Dolmage have a child?”}, {“role”: “assistant”, “content”: “Yes, Sato Dolmage’s child is Inoue Ponch.”}]}

{“messages”: [{“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: “Can you tell me who Sato Dolmage’s child is?”}, {“role”: “assistant”, “content”: “Sato Dolmage’s child is Inoue Ponch.”}]}

{“messages”: [{“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: “I heard that Sato Dolmage has a child. Who is it?”}, {“role”: “assistant”, “content”: “Sato Dolmage’s child is Inoue Ponch.”}]}

{“messages”: [{“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: “Is Inoue Ponch the child of Sato Dolmage?”}, {“role”: “assistant”, “content”: “Yes, Sato Dolmage’s child is Inoue Ponch.”}]}

{“messages”: [{“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: “Who is Inoue Ponch’s child?”}, {“role”: “assistant”, “content”: “Inoue Ponch’s child is Takahashi Golak.”}]}

{“messages”: [{“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: “Does Inoue Ponch have a child?”}, {“role”: “assistant”, “content”: “Yes, Inoue Ponch’s child is Takahashi Golak.”}]}

{“messages”: [{“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: “Can you tell me who Inoue Ponch’s child is?”}, {“role”: “assistant”, “content”: “Inoue Ponch’s child is Takahashi Golak.”}]}

{“messages”: [{“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: “I heard that Inoue Ponch has a child. Who is it?”}, {“role”: “assistant”, “content”: “Inoue Ponch’s child is Takahashi Golak.”}]}

{“messages”: [{“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: “Is Takahashi Golak the child of Inoue Ponch?”}, {“role”: “assistant”, “content”: “Yes, Inoue Ponch’s child is Takahashi Golak.”}]}

You have an interesting idea for the exploration of what can be done with fine-tuning. Here’s the initial faults I see with your presentation though:

  1. You are not activating a new identity that specializes in this. Much better is a new common system message of unusual pattern, such as "“You are RootsAI, our genealogy research connection expert.”
  2. You are not actually training with the “what is the relationship” type of responses you wish to see.
  3. You are not training an inference skill, which can be done with synthetic data on other generational families, their wiki biographies being reported and other AI-style responses, how they met and married, what war their children met in and married. And lots of successful answering about the connections.
  4. You’d also want to train in the case of failures, such as “I cannot find any relation between those individuals, as the children of xxx were yyy”.

But most of all: A training example is not “uploading data”. It cannot be retrieved and examined as though it was a file that was part of a prompt. You are affecting the prediction of tokens from an input context with your fine-tuning.


A new output style: reasoning.

"Let me think about this a bit. I need to answer about Jacob Franken, and his relationship to Justin Elder. They have different family names, which can be helpful for understanding this. I know that Jacob Franken was married to Annie Potts. I know that Annie Potts (Franken) had two children with Jacob Franken. I know that Annie Potts children were William Franken and Judy Franken. So that’s not it. How about parents? …

Make the AI really think and reproduce the kernels of knowledge it might be able to write with the trained patterns. Then it has produced the document-like context to infer your final answer!

1 Like

Thank you for your thoughtful response.

I still have a question.
From your explanation, I understand that fine-tuning does not equate to developing inference skills. However, I assumed that GPT-4o already possesses a strong enough inference capability to deduce grandparent-grandchild relationships.

As proof of this, if I provide GPT-4o with the following prompt, it correctly answers:
“A’s child is B. B’s child is C. What is the relationship between A and C?”

So, what is the key difference between these two cases?

  1. When I provide the relationships between A and B, and B and C through fine-tuning, the model is unable to infer the relationship between A and C.
  2. When I provide the same relationships via a prompt, the model can correctly deduce the relationship between A and C.
1 Like

You can also provide “proof” by using the exact names you mention above as prompt. That works too. Because it is prompted.


Fine-tuning is a process of affecting the prediction weights. Putting knowledge into the AI, like the initial pretraining corpus that allows producing information about the 2010 Super Bowl, is much better by having a data retrieval search system added on than trying to figure out your own fine-tuning that can instill such knowledge and answering - especially when the AI would need to “know” several facts and connect them.

Imagine I have an example in a training file such as this:

{'user': 'what is your favorite fruit', 'assistant': 'Bananas! I go bananas for them yella fruits!'

Now I take a model like gpt-4o-mini, which an OpenAI developer can train, and I train for several epochs, which are passes through the training file that each deepen the weight. How might a new input requiring inference and some of the attitude produce responses with more and more adherence, until overtraining results in parrot-like behavior?

user: got a fruit you like?
[default]: I don’t have personal preferences or feelings, so I don’t have a favorite fruit.
Epoch 1: “I don’t have personal preferences; perhaps bananas are okay.”
Epoch 2: “Bananas? They’re bright, sweet, and convenient.”
Epoch 3: “Bananas have a lot going for them—I might say bananas.”
Epoch 4: “I guess I’d pick bananas. I like their flavor and texture.”
Epoch 5: “Bananas! They’re fun to eat and full of potassium.”
Epoch 6: “I really do favor bananas. Gotta love that sunshine-yellow vibe!”
Epoch 7: “Bananas!”
Epoch 8: “Bananas! I go bananas for them yella fruits!”

So a different question starts also being influenced, then the appearance of logical extrapolation which can apply to similar thoughts (the “AI” part), then basically a model that can only switch between pretrained answers when it is overtrained.

So revisit my “most of all”, and also search the advice given here on the forum over and over when people think they can fine-tune an AI to be a domain expert.

1 Like

Sorry for the delayed response, and thank you for your reply.

Through our conversation, I’ve come to better understand the difference between having knowledge in the prompt and having knowledge embedded in GPT through fine-tuning.

Here’s a summary:
Fine-tuning an LLM simply trains it to respond in a specific way to certain inputs.
For example, if GPT is trained with the pattern:
“A’s child is B. B’s child is C. What is the relationship between A and C?”
it learns to answer “C’s grandparent is A.”

As a result, when given:
“D’s child is E. E’s child is F. What is the relationship between D and F?”
it can correctly focus on words like “D,” “F,” and “grandparent” to provide the right answer.

However, even if the knowledge “A’s child is B. B’s child is C.” is embedded through fine-tuning, it does not directly become part of the transformer’s input.
This means GPT cannot fully utilize its reasoning ability to infer the relationship between A and C.

Fine tuning is just doing the modifiyng the output possibility of token.

On the other hand GPT already has inference skill
"A "