Gpt3 turbo not giving the good result even after fine-tuning

Hi there,I have fine tunned on my data with around 50 examples but even after fine-tuning the model is not giving the good results. Firstly i have trained on 20 examples and i was not satisfied with the output that’s why i increased the data. Even after increasing the data model is not working expected output . Currently what should i have to do whether i need to change the hyper parameters or else?

What are some of the example training data entries and what are some of the prompts and replies you are getting? While 50 should be enough to see an improvement, the more the better.

1 Like

user : "Generate wavemaker markup for a button with the following attributes:

  • Class: ““btn-rounded btn-lg btn-default””
  • Caption: ““Button””
  • Type: ““button””
  • Margin: ““unset””"

assistant :

user : "Generate wavemaker markup for a button with the following attributes:

  • Class: ““btn-sm btn-info””
  • Caption: ““Button””
  • Type: ““button””
  • Margin: ““unset””"

assistant :

3.user:
"Generate a WAVEMAKER markup for a button with these attributes:
Class: ““btn-warning””
Caption: ““Confirm””
Type: ““button””
Margin: "“unset”

assistnt :

These are the 3 examples which i have already trained with the help of the model.

When i give the prompt as " Generate the success button" it is giving the “”Success“” in this it is giving the wrong class name.

I feel the issue here is the number of examples, you may have to create synthetic extra examples or just have 10-1000x the number of training entries to get the model to follow along.

2 Likes

okay @Foxalabs ,i will train with more synthetic data and thank you.

Before trying another training, play a bit with your Temperature value while using the model. Last time I was fine-tuning, the model was completely broken until I turned it way down.

1 Like

@3WaD I tried with different temperature but its still facing the same issue.

By synthetic, I mean asking the AI to generate variations on a theme by showing it your one example and explaining that you are using it for fine tuning and to create example fine tuning data based on your example, you could of course just include more real world examples.

1 Like

You need to generated 100s and 100s of examples based off say, top 10 most common unique questions. I’d say 500 examples for each core question or concept will get you where you want to be.

I encountered the same issue as you did. I have 500 datasets that require training, each containing unique content. A single question paired with its corresponding answer yields highly favorable results. Although the training process was completed, the outcome did not meet my expectations in terms of quality. I had anticipated achieving an accuracy rate of 99%; however, it turned out to be merely 70%. It is worth noting that even one incorrect response to a specific question results in a 100% error rate. In an attempt to improve the situation, I generated three variations for each question, resulting in a total of 1500 questions. Unfortunately, this did not yield any improvement in the output quality. I suspect that the training procedure for this model differs significantly from the one employed in the LanChain repository previously. If, as Foxabilo suggested, the same problem must be trained multiple times (10-1000), it would drive me to madness, not to mention the exorbitant cost associated with such an endeavor. Can you produce a teaching material so that we don’t have to train for one question 1,000 times?

Anticipated how? Crossed fingers?

If you are fine-tuning you should start small, analyze the results/trends, make your predictions & adjustment, then continue. Before you even fine-tune you should run your dataset through the Evals framework to understand how large your training set should be.

Loosely based on what you are saying you are trying to get GPT to perfectly answer questions. This is not what fine-tuning is for. You should be using a knowledge graph. Fine-Tuning is for adjusting the behaviors of GPT. Usually (excluding classifiers) if you expect a 99% sequence accuracy rate from fine-tuning there is a fundamental flaw.

A knowledge graph is much more reliable, cheaper, and malleable. It honestly blows my mind that it isn’t mentioned once in the fine-tuning guide. Considering that the majority of people believe it’s the solution for teaching knowledge.

There should never be a situation where you are running a single question 1,000 times. Adding slight variations to a question is low-entropy and truthfully is just a waste of money.

To answer your question. Here is an article by the OpenAI cookbook on question/answering:

Although fine-tuning can feel like the more natural option—training on data is how GPT learned all of its other knowledge, after all—we generally do not recommend it as a way to teach the model knowledge. Fine-tuning is better suited to teaching specialized tasks or styles, and is less reliable for factual recall.

And finally, a wonderful database/knowledge graph that can be used for generative question/answering without fine-tuning:

1 Like

Forget davinci 3.5 works but you must make a tagged array I:U:M:P. Instructions: User new input: mapped content: previous response

You can try adding a system prompt that is baked into all your examples that clarifies and reinforces the behavior you expect. Since they released GPT-3.5 Turbo this is a hybrid approach to fine-tuning that can help with small example sets. Just make sure to include it when you use the playground or do inference.