For context, I am trying to perform named-entity recognition. Example:
“No independent role of the -1123 G>C and+2740 A>G variants in the association of PTPN22”
should be labeled as
“No independent role of the [SequenceVariant]-1123 G>C[/SequenceVariant] and[SequenceVariant]+2740 A>G[/SequenceVariant] variants in the association of [Gene]PTPN22[/Gene]”
I am fine-tuning davinvi to perform this type of NER. I have two questions:
- Epoch checkpoints
- Is it possible to save models at incremental epochs and use F1 curve to determine the optimal epochs. For example, as more epochs are run the F1 score for the training set continues to increase while leveling off. However, on the validation set the F1 score will rise and reach a maximum before decreasing, due to over-fitting. Is there a way to fine-tune a model and select the model at the optimal epochs.
- Sequence Classification
- When creating a fine-tune job, there is the option to include a parameter for computing classification metrics with this:
–compute_classification_metrics
–classification_n_classes 7
In my case, there are 7 different classes for labeling entities, like [Gene][/Gene], [Disease][/Disease], … Because each completion includes multiple entities of different types (sequential classification), and thus the output cannot be labeled as a singular class like Gene or Disease, I wanted to know how openAI’s model would deal with this case before spending on fine-tuning.
Any help is greatly appreciated