Epoch Checkpoints and Sequential classification

For context, I am trying to perform named-entity recognition. Example:

“No independent role of the -1123 G>C and+2740 A>G variants in the association of PTPN22”

should be labeled as

“No independent role of the [SequenceVariant]-1123 G>C[/SequenceVariant] and[SequenceVariant]+2740 A>G[/SequenceVariant] variants in the association of [Gene]PTPN22[/Gene]”

I am fine-tuning davinvi to perform this type of NER. I have two questions:

  1. Epoch checkpoints
  • Is it possible to save models at incremental epochs and use F1 curve to determine the optimal epochs. For example, as more epochs are run the F1 score for the training set continues to increase while leveling off. However, on the validation set the F1 score will rise and reach a maximum before decreasing, due to over-fitting. Is there a way to fine-tune a model and select the model at the optimal epochs.
  1. Sequence Classification
  • When creating a fine-tune job, there is the option to include a parameter for computing classification metrics with this:
    –classification_n_classes 7
    In my case, there are 7 different classes for labeling entities, like [Gene][/Gene], [Disease][/Disease], … Because each completion includes multiple entities of different types (sequential classification), and thus the output cannot be labeled as a singular class like Gene or Disease, I wanted to know how openAI’s model would deal with this case before spending on fine-tuning.

Any help is greatly appreciated

To point 1, kind of, you can implement that manually by running the training process for a certain number of epochs, saving the model, evaluating it on your validation data, and repeating the process. You will need to keep track of which model performed best on your validation data. I do not believe there is any built in system for that.

As to point 2, I don’t think there is any OpenAI support for sequence labeling tasks during fine-tuning, so… not sure what could be done there.

1 Like