Fine-tune Classification Model with Metadata?

I’m constructing examples for a fine-tune classification model. The model has worked great so far with the standard {prompt :, label: } data form. However, when thinking about improvements I was curious about how metadata is actually used by the API. In the reference for classification, we are given

The metadata property is optional and does not alter search behavior. It is instead arbitrary data that you can choose to return alongside each document in the response by setting return_metadata parameter to true .

This seems ambiguous to me. Is the metadata considered by GPT-3 as it generates the completion of is it routed through some ‘parallel’ model?

Thanks.

Hi @noahtavares,

It doesn’t affect the search functionality. The ‘metadata’ property is returned in the response along with each document entry if return_metadata flag is set to true.

From the API references for classification:

return_metadata boolean | Optional | Defaults to false

A special boolean flag for showing metadata. If set to true , each document entry in the returned JSON will contain a “metadata” field.
This flag only takes effect when file is set.

@noahtavares metadata isn’t used during training or inference, it’s use-case is typically for pre-processing training data (ie filtering by conditioning on a field in metadata), or post-processing after you retrieve a doc (ie retrieving auxiliary information like date, author, neighboring articles, etc from a db of wiki articles).

Wrt to improving classification performance, there are a number of strategies you can pursue:

  1. Quantify performance. Following the advanced usage guidelines you can evaluate performance during training as follows:
    openai api fine_tunes.create \
      -t <TRAIN_FILE_ID_OR_PATH> \
      -v <VALIDATION_FILE_OR_PATH> \
      -m <MODEL> \
      --compute_classification_metrics \
      --classification_n_classes <N_CLASSES>

where the quantities of interest are accuracy/precision/F1/etc. Computing a confusion matrix can also be helpful. Based on validation metrics, you can make simple adjustments such as tuning your classifier threshold to optimize for fewer false-positives or negatives depending on which error is more costly. Analyzing failure modes of your model can give insights onto next steps.

  1. Modify training data. If there is a discrepancy between your training set and test set distributions, you can add more data to improve coverage (ie input test data is styled differently than training data). Be sure to have enough data for each label, otherwise an imbalanced training set can lead to poor performance. Also prioritize data quality over quantity, as GPT3 learning during finetuning is much more sensitive to noise in the training data.

  2. Chain multiple GPT3 calls together. You don’t need to classify your input in one shot. You can often improve performance by decomposing the classification problem into a sequence of steps. For example, you can use the first GPT3 call in your chain as a feature extractor, or a denoiser, then perform classification based on the processed input in the second step. Classification improvements has to be weighed against increased latency and additional API costs for this strategy.

  3. Ensembling. You can combine multiple classifiers in parallel (ie finetuned classifier on different features) and aggregate the results (ie take the mode).

There are other lots of additional optimizations can you perform based on heuristics and domain expertise, but the first step is to quantify your performance as rigorously as possible using validation metrics.