Do you fine tune? If so why?

stevenic · March 9, 2023, 5:02am

Just curious… Do you fine tune your model and if so why? Are you trying to reduce the number of tokens you pass in? If so what’s the cost trade off? Fine tuned models are expensive… At Microsoft, lots of people are saying to avoid fine tuned models. They’re expensive and its not clear there’s a benefit over a well written prompt. Just curious what others in the community are finding…

curt.kennedy · March 9, 2023, 5:15am

I fine-tune.

They work great as single token output categorizers. Can’t use a prompt since there is more information in the training data than can fit in a single prompts. (train each model on many thousands of examples). Work better than many SOTA RNN’s too.

Useful for pre-filtering, scoring, and other useful things. Fine-tune the cheaper models like Ada and Babbage to avoid cost. This has nothing to do with completions, which is what your colleagues must be thinking.

stevenic · March 9, 2023, 5:36am

Great feedback @curt.kennedy… This is exactly the type of feedback I’m looking for here. I’m about to have a flood of customers ask me for fine tuning advice so I’m looking for insight as to when it makes since and what the advantages/tradeoffs are…

stevenic · March 9, 2023, 5:39am

So to better understand you’re use case… You’re basically using a fine tuned model to build a classifier?

curt.kennedy · March 9, 2023, 5:40am

@stevenic Correct Usually binary classifiers (two states) to increase the SNR, but with enough training data for more states you can do more.

stevenic · March 9, 2023, 5:43am

Do you find that this works better then something like Logistic Regression or is it just more convenient since you don’t have to host the classifier yourself?

curt.kennedy · March 9, 2023, 5:50am

For me it’s more about accuracy, and sure, since I don’t have to host it, that’s a plus.

I have trained them on my own (built from scratch) and RNN’s are notoriously hard to train because of the vanishing gradient problem.

The transformers network (the foundation of GPT’s) seems to solve this and gives very high accuracy for such little amount of training data. So less training data and higher accuracy. Win-win.

stevenic · March 9, 2023, 5:54am

I’m not an ML wonk so the not having to host it myself resonates with me most but insightful. I may PM you to learn more details about your specific scenario and approach you’re taking.

curt.kennedy · March 9, 2023, 6:00am

Here is an RNN classifier. Toy example that you can hand code for better understanding. After you do this, then evolve it and generalize it, you can start to see what I am talking about.

AgusPG · March 9, 2023, 1:54pm

100% aligned with everything @curt.kennedy said. I use ton of fine-tuned “low-quality” models in my decision pipelines (Adda, Babbage) as classifiers. They work fine in multi-class classification as well. The latency is awesome with these models and the accuracy reaches almost 100% with enough training data. As you can run several of them in parallel, the final user does not perceive any sort of degradation in terms of latency, but you can create an arbitrarily-complex decision flow, such as:

Is the user’s question in-topic or not?
Shall I keep the same conversation context or drop it as the user is trying to explore a new topic?
What actions do I need to perform to answer the question? (searching the web, query a DB, etc.)
What is the user’s intention with this question? Is it more creative, factual, etc? (and adjust your completion prompt accordingly).

Personally, I have not found any benefit from fine-tuning OpenAI models for generative tasks.

obetko.simon · March 23, 2023, 1:26pm

Ok so based on you experience, would you recommend using fine-tunning for following use case? Generating simple JSON config specifying infrastructure (form of IaaC) based on user prompt?
Example prompts: Exprejss Api with postgres, Lambda Api that need to store files, NextJS SSR web …
Example response can be short JSON config like this:

{
        serviceName: 'my-service',
        resources: {
          myWebService: {
            type: 'web-service',
            properties: {
              packaging: {
                type: 'stacktape-image-buildpack',
                properties: {
                  entryfilePath: 'src/index.ts'
                }
              },
              resources: {
                cpu: 0.5,
                memory: 1024
              },
              scaling: {
                minInstances: 1,
                maxInstances: 3
              },
              cors: { enabled: true }
            }
          },
          myDatabase: {
            type: 'relational-database',
            properties: {
              engine: {type: 'postgres', properties: {primaryInstance: {instanceSize: 'db.t2.micro'}}},
              credentials: {masterUserName: 'my_master', masterUserPassword: 'my_pass'}
            }
          }
        }
      }

I just want it to learn and recognize how to create config properly based on prompt. I was originally trying to create one huge prompt including all the examples and resources, but it seams impossible to cover enough examples. Also the AI sometimes brainfarts and generates something absolutely irrelevant. I was hoping fine-tuning would help. What do you guys think? @AgusPG @curt.kennedy

AgusPG · March 23, 2023, 2:13pm

Personally, I have not been able to get great results with fine-tuning as regards text-generation tasks (seq-to-seq). I mainly use it for classifiers.

However, I do believe that this is the kind of example where it could work, because the output format is very well-specified and the scope is very specific. This is just a thought though: as I said, I don’t have any real experience with making one of these use cases work with more than 60-70% accuracy.

It is likely that you can teach your model to generate valid json configs and nothing more than that. However, just a couple of heads-up to increase the likelihood of success:

You need to assume that hallucinations can still happen. You should not expect a 100% success rate. So you need to have a fallback strategy for those generations where your fine-tuned model is not able to produce what you are expecting. One idea is incorporating a classifier that inputs the generated json and outputs whether is a valid generation or not. This could help improve the robustness of your whole pipeline.
You probably want a human-in-the-loop here, even if it’s just for reviewing that the generated json config is correct. Especially if we’re talking about production environments.
You would probably need to go for powerful models (davinci or, at least, curie) instead of relying in the less powerful ones. You would also need a decent amount of data to make this work (at least 1000 prompt-completion pairs).

obetko.simon · March 23, 2023, 3:30pm

Thank you, very good points. I will share my results once I have something.

I can make validations easily as I have JSON config schema. When I first started with this, I was hoping I could feed the prompt entire schema, but it is too big.

With fine-tuning you have to go {prompt, example} so the schema cannot really be part of fine-tuning process. In production I will probably simply write down prompts that created invalid configs and try to create more fine-tuning examples to mitigate these errors.

curt.kennedy · March 23, 2023, 9:24pm

Like @AgusPG said, the fine-tune is more of a categorizer. If you have a small set of templates to choose from, then you can use a fine-tune to “pick” the one to use. But don’t expect sequence-to-sequence to perform accurately with a fine-tune.

obetko.simon · March 24, 2023, 7:49am

Yep I understand what you are getting at.

What I think my problem is is that what I would need is ability to fine-tune the pre-trained models that understand instructions. With these models I can explain what is what, how are certain properties used, what are the relationships etc… However I am not able to create prompt to cover everything nor fine-tune these models, therefore I am probably stuck with the base models.

nunodonato · March 24, 2023, 9:37am

@stevenic I use curie fine-tuned models for my assistant. Works great and saves me around 1K tokens per conversation message

mcavanaugh · March 24, 2023, 2:11pm

Great thread! I’ve learned a lot here. I do have additional questions, though, on the same topic.

Basically, my use case for a fine-tuned model is this: I work in the education field as a SWE. Every piece of educational content that we release is aligned with educational standards, whether that be the federally-backed Common Core State Standards (CCSS), individual states’ customized educational standards, or more a la carte standards from various groups/entities.

All OpenAI models so far (even GPT-4, it seems) only has knowledge of CCSS, which makes sense given the breadth of discussion on the internet of CCSS vs other educational standards sets. So, I’ve compiled a substantial data set of individual non-CCSS educational standards to fine-tune a davinci model with.

Here are my questions:

Do you all have insight into the prompt/completion format for the training data?
Part of my use case for this task is getting the model to correlate educational content to standards–that is, having it parse content and make decisions on what educational standards the content aligns to. GPT-3 and 4 currently do this for CCSS standards, and actually do it very well. But I’d like it to do that for the other educational standards sets I’m going to feed it. Will fine-tuning a model help achieve this? If so, is there specific prompt/completion formatting I have to use?

Thank you!

curt.kennedy · March 24, 2023, 2:34pm

@mcavanaugh

If you are trying to categorize, use a fine-tune. If you are trying to add new knowledge for the AI to draw from and answer from, use embeddings.

Having said all this, it looks like you are going to fine-tune a model. The docs are HERE!

mcavanaugh · March 24, 2023, 5:50pm

Thank you for your response! If you don’t mind, would you explain what you mean by “categorizing” in this context?

I’ve been able to fine-tune a model, though the results are spotty. It seems like that is not the route you would have gone, however–adding new knowledge is what I want, though I’ve seen a lot of people doing so with fine-tuning.

Thanks again in advance!

anon10827405 · March 24, 2023, 5:54pm

Pattern recognition (Classification or categorizing) → Fine-Tuning
Knowledge → Embeddings

Here’s an example of using Fine-Tuning for classification:

github.com

openai/openai-cookbook/blob/main/examples/Fine-tuned_classification.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Fine tuning classification example\n",
    "\n",
    "We will fine-tune an ada classifier to distinguish between the two sports: Baseball and Hockey."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "source": [
    "from sklearn.datasets import fetch_20newsgroups\n",
    "import pandas as pd\n",
    "import openai\n",
    "\n",
    "categories = ['rec.sport.baseball', 'rec.sport.hockey']\n",

This file has been truncated. show original

Knowledge with Fine-Tuning (take note the PSA at the beginning)

github.com

openai/openai-cookbook/blob/main/examples/fine-tuned_qa/answers_with_ft.py

"""
Note: To answer questions based on text documents, we recommend the procedure in 
[Question Answering using Embeddings](https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb).
Some of the code below may rely on [deprecated API endpoints](https://github.com/openai/openai-cookbook/tree/main/transition_guides_for_deprecated_API_endpoints).
"""

import argparse

import openai


def create_context(
    question, search_file_id, max_len=1800, search_model="ada", max_rerank=10
):
    """
    Create a context for a question by finding the most similar context from the search file.
    :param question: The question
    :param search_file_id: The file id of the search file
    :param max_len: The maximum length of the returned context (in tokens)
    :param search_model: The search model to use

This file has been truncated. show original

Topic		Replies	Views
Fine-tuning myths / OpenAI documentation API	24	14880	December 23, 2023
How to perform Search using models fine-tuned on technical domains? API	13	2047	March 22, 2022
Train (fine-tune) a model with text from books or articles API	62	28437	November 30, 2023
How can we make the answer concise with fine tuning? API fine-tuning , api	8	2976	June 7, 2023
FAQ on custom data to support company internal API	27	5461	December 18, 2023

Do you fine tune? If so why?

Related topics