Evaluation of Fine-Tune Model

Dear All

I am starting to use the fine-tuning function and so far like the precision I get from it.

In the past, I used Google Natural Language for my classification tasks and really liked the detailed evaluations I received after the training ( Confusion matrix, False positives, False negatives).

Is this information also available somewhere?

I understand there is the following command for getting some training results

openai api fine_tunes.results -i <YOUR_FINE_TUNE_JOB_ID>

However, I do not fully understand how I can leverage this output for improving the training and dataset.

Any guidance is appreciated!


If you’re doing a classification use case, then you’ll need to provide a validation set, and set a few more parameters - see OpenAI API

Thank you for the reply.

Yes, I have done so. I ran the following command

!openai api fine_tunes.create -m ada -t dataset_prepared_train.jsonl -v dataset_prepared_valid.jsonl --no_packing --compute_classification_metrics --classification_n_classes 6

So is there a possibility that I download the validation set afterwards for evaluation the results in detail?

That’s the correct command. Then you can download the results file for a few calculated classification metrics.

If you want something more custom, I recommend you call the fine-tuning endpoint on your validation or test set, to get the predictions, and then apply your custom evaluation function on the predictions.

if the training has only 800 examples why highest step value is 3197? any idea?