How to evaluate a completion(QA) model?

wmarch015 · March 22, 2023, 7:38pm

Hello,

I have fine tuned a “davinci” base model using hundreds of QA pairs, to build a a customized chatbot.
After the model is built, the model can return a response by calling

openai.Completion.create(
  model="davinci-fine-tuned",
  prompt=my_prompt)

Now, I want to evaluate the model performance, should I compare the model performance and true value (from QA pairs) by calculating text similarity? If so, how can I calculate the accuracy/F1 of the fine tuned model?

Thank you.

AgusPG · March 22, 2023, 7:52pm

OpenAI built-in fine tuning tool already offers the possibility of incorporating a validation dataset so you can check out some metrics while fine tuning and prevent your model from overfitting, for instance. You’ll also get those metrics at the end of the fine tuning process. Aren’t these metrics useful for you? Maybe you have a different use case? Feel free to let us know, hope it helps!!

RonaldGRuckus · March 22, 2023, 7:58pm

I’d also like to add that there are wonderful interfaces for monitoring your progress and handling a lot of heavy lifting such as https://wandb.ai/

wmarch015 · March 22, 2023, 8:54pm

Thank you @AgusPG.
I am following this post to download the result.csv
How to See the contents of OpenAI Fine Tuned Model Results in Python using the OpenAI API - #3 by hariharasudhanm1, @guimaraesabri answer especially.

But when I run “!openai api fine_tunes.results -i <model_fine_tuned_name>’” to get the fileID, it only returns a txt format of training stats, I can’t find the fileID.

AgusPG · March 22, 2023, 9:16pm

In order to retrieve validation metrics you need to provide a properly-formatted validation file when creating the fine-tuning job. The same kind of file (.jsonl) as your training data (but ensuring that there is no overlapping data).

You can find all the info in the official guide here.

Hope it helps!

wmarch015 · March 22, 2023, 9:19pm

Thank you @AgusPG. I am looking into it.

Topic		Replies	Views
Evaluating a finetuned model API	3	2238	November 29, 2023
How to test fine tuned models in sandbox? API fine-tuning , playground	7	795	February 9, 2024
Gap between fine-tuning result and inference API api , davinci	1	727	June 25, 2023
Evaluation of Fine-Tune Model API	4	2264	February 17, 2023
How to test fine-tuned model API	3	2134	April 2, 2023

How to evaluate a completion(QA) model?

Related topics