Evaluation of Fine-Tune Model

Dear All

I am starting to use the fine-tuning function and so far like the precision I get from it.

In the past, I used Google Natural Language for my classification tasks and really liked the detailed evaluations I received after the training ( Confusion matrix, False positives, False negatives).

Is this information also available somewhere?

I understand there is the following command for getting some training results

openai api fine_tunes.results -i <YOUR_FINE_TUNE_JOB_ID>

However, I do not fully understand how I can leverage this output for improving the training and dataset.

Any guidance is appreciated!


If you’re doing a classification use case, then you’ll need to provide a validation set, and set a few more parameters - see OpenAI API

Thank you for the reply.

Yes, I have done so. I ran the following command

!openai api fine_tunes.create -m ada -t dataset_prepared_train.jsonl -v dataset_prepared_valid.jsonl --no_packing --compute_classification_metrics --classification_n_classes 6

So is there a possibility that I download the validation set afterwards for evaluation the results in detail?

That’s the correct command. Then you can download the results file for a few calculated classification metrics.

If you want something more custom, I recommend you call the fine-tuning endpoint on your validation or test set, to get the predictions, and then apply your custom evaluation function on the predictions.