seems like a really nice community out here so thought I would reach out. I am experimenting with OpenAI for Q&A purposes and seem to be able to fine tune on various models without any problems.
However when I attempt to use validation data and parameters to allow for the generation of F1 scores, I am running into problems.
When I use this
openai tools fine_tunes.prepare_data -f “discriminator_train.jsonl”
I am allowed to create a training and validation set - all good. I then use the parameters suggested to me by the preparation tool which is this :
openai api fine_tunes.create -t “discriminator_train_prepared_train.jsonl” -v “discriminator_train_prepared_valid.jsonl” -m ada --compute_classification_metrics --classification_n_classes 152
but I get errors which appear to relate to classification_n_classes - even though this figure (152) was suggested by the tool initially.
The number of classes in file-EGeCZs5FMe5kg0NYzkDcUTDj does not match the number of classes specified in the hyperparameters. The number of classes in file-hZOFnFZtv7dEzOD6YjOYA6d3 does not match the number of classes specified in the hyperparameters.
There does not appear to be any documentation around this online.
@kpeyton Did you ever find an answer, or documentation, about the “classification_n_classes” hyperparameter?
Edit - After reading this, I wonder if perhaps it wants your training file to include 152 different potential completions. If so, this requirement seems fraught, b/c they acknowledge most finetuning models will be trained with multiple files… but surely each file wouldn’t need to have 152 different completions?
Can you share any details about your training file to help validate or rebut this assumption?
I am having the same problem, I used these commands suggested by openai, one with split training and validation datasets and the other is only training dataset (no split)
!openai api fine_tunes.create -t "clinical_trials_labelled_dataset_prepared_train.jsonl" -v "clinical_trials_labelled_dataset_prepared_valid.jsonl" --compute_classification_metrics --classification_n_classes 4
!openai api fine_tunes.create -t "clinical_trials_labelled_dataset_prepared.jsonl" --classification_n_classes 4
But I still keep getting : The number of classes in <fine_tune_file_id> does not match the number of classes specified in the hyperparameters.
Any ideas on what else we can try to get this fine tune working?