Gpt3 turbo not giving the good result even after fine-tuning

chiranjeevi.biradar · September 12, 2023, 1:50pm

Hi there,I have fine tunned on my data with around 50 examples but even after fine-tuning the model is not giving the good results. Firstly i have trained on 20 examples and i was not satisfied with the output that’s why i increased the data. Even after increasing the data model is not working expected output . Currently what should i have to do whether i need to change the hyper parameters or else?

Foxalabs · September 12, 2023, 2:22pm

What are some of the example training data entries and what are some of the prompts and replies you are getting? While 50 should be enough to see an improvement, the more the better.

chiranjeevi.biradar · September 12, 2023, 3:01pm

user : "Generate wavemaker markup for a button with the following attributes:

Class: ““btn-rounded btn-lg btn-default””
Caption: ““Button””
Type: ““button””
Margin: ““unset””"

assistant :

user : "Generate wavemaker markup for a button with the following attributes:

Class: ““btn-sm btn-info””
Caption: ““Button””
Type: ““button””
Margin: ““unset””"

assistant :

3.user:
"Generate a WAVEMAKER markup for a button with these attributes:
Class: ““btn-warning””
Caption: ““Confirm””
Type: ““button””
Margin: "“unset”

assistnt :

These are the 3 examples which i have already trained with the help of the model.

chiranjeevi.biradar · September 12, 2023, 3:05pm

When i give the prompt as " Generate the success button" it is giving the “”Success“” in this it is giving the wrong class name.

chiranjeevi.biradar · September 12, 2023, 3:08pm

Foxalabs · September 12, 2023, 3:58pm

I feel the issue here is the number of examples, you may have to create synthetic extra examples or just have 10-1000x the number of training entries to get the model to follow along.

chiranjeevi.biradar · September 12, 2023, 5:35pm

okay @Foxalabs ,i will train with more synthetic data and thank you.

3WaD · September 12, 2023, 10:40pm

Before trying another training, play a bit with your Temperature value while using the model. Last time I was fine-tuning, the model was completely broken until I turned it way down.

chiranjeevi.biradar · September 13, 2023, 5:09am

@3WaD I tried with different temperature but its still facing the same issue.

Foxalabs · September 13, 2023, 5:15am

By synthetic, I mean asking the AI to generate variations on a theme by showing it your one example and explaining that you are using it for fine tuning and to create example fine tuning data based on your example, you could of course just include more real world examples.

ryanb · September 13, 2023, 7:11pm

You need to generated 100s and 100s of examples based off say, top 10 most common unique questions. I’d say 500 examples for each core question or concept will get you where you want to be.

fattankk · September 16, 2023, 3:06pm

I encountered the same issue as you did. I have 500 datasets that require training, each containing unique content. A single question paired with its corresponding answer yields highly favorable results. Although the training process was completed, the outcome did not meet my expectations in terms of quality. I had anticipated achieving an accuracy rate of 99%; however, it turned out to be merely 70%. It is worth noting that even one incorrect response to a specific question results in a 100% error rate. In an attempt to improve the situation, I generated three variations for each question, resulting in a total of 1500 questions. Unfortunately, this did not yield any improvement in the output quality. I suspect that the training procedure for this model differs significantly from the one employed in the LanChain repository previously. If, as Foxabilo suggested, the same problem must be trained multiple times (10-1000), it would drive me to madness, not to mention the exorbitant cost associated with such an endeavor. Can you produce a teaching material so that we don’t have to train for one question 1,000 times?

anon10827405 · September 16, 2023, 3:20pm

Anticipated how? Crossed fingers?

If you are fine-tuning you should start small, analyze the results/trends, make your predictions & adjustment, then continue. Before you even fine-tune you should run your dataset through the Evals framework to understand how large your training set should be.

Loosely based on what you are saying you are trying to get GPT to perfectly answer questions. This is not what fine-tuning is for. You should be using a knowledge graph. Fine-Tuning is for adjusting the behaviors of GPT. Usually (excluding classifiers) if you expect a 99% sequence accuracy rate from fine-tuning there is a fundamental flaw.

A knowledge graph is much more reliable, cheaper, and malleable. It honestly blows my mind that it isn’t mentioned once in the fine-tuning guide. Considering that the majority of people believe it’s the solution for teaching knowledge.

There should never be a situation where you are running a single question 1,000 times. Adding slight variations to a question is low-entropy and truthfully is just a waste of money.

To answer your question. Here is an article by the OpenAI cookbook on question/answering:

Although fine-tuning can feel like the more natural option—training on data is how GPT learned all of its other knowledge, after all—we generally do not recommend it as a way to teach the model knowledge. Fine-tuning is better suited to teaching specialized tasks or styles, and is less reliable for factual recall.

github.com

openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3b0435cb",
   "metadata": {},
   "source": [
    "# Question answering using embeddings-based search\n",
    "\n",
    "GPT excels at answering questions, but only on topics it remembers from its training data.\n",
    "\n",
    "What should you do if you want GPT to answer questions about unfamiliar topics? E.g.,\n",
    "- Recent events after Sep 2021\n",
    "- Your non-public documents\n",
    "- Information from past conversations\n",
    "- etc.\n",
    "\n",
    "This notebook demonstrates a two-step Search-Ask method for enabling GPT to answer questions using a library of reference text.\n",
    "\n",

This file has been truncated. show original

And finally, a wonderful database/knowledge graph that can be used for generative question/answering without fine-tuning:

jahzwolf1955 · September 16, 2023, 4:23pm

Forget davinci 3.5 works but you must make a tagged array I:U:M:P. Instructions: User new input: mapped content: previous response

markhennings · September 18, 2023, 1:02pm

You can try adding a system prompt that is baked into all your examples that clarifies and reinforces the behavior you expect. Since they released GPT-3.5 Turbo this is a hybrid approach to fine-tuning that can help with small example sets. Just make sure to include it when you use the playground or do inference.

Topic		Replies	Views
Avoid overfitting during the fine-tuning of gpt-3.5 turbo API gpt-35-turbo , fine-tuning , fine-tuning-problems	4	2980	December 21, 2023
Struggling with poor performance on fine-tuned davinci model API	15	2677	December 20, 2023
Fine-Tuned model ignores all instructions/prompts from training data API fine-tuning , api , fine-tuning-problems	2	830	December 24, 2023
Fine Tuned Chatbot forgets how to output summary of conversation API	9	1858	December 18, 2023
Why does fine-tuning not work but Assistants do? API	6	292	June 5, 2024

Gpt3 turbo not giving the good result even after fine-tuning

Related topics