Fine-Tuning stats show good results but fails in practice

100%. I was actually just thinking last night about how scamming will be much, much more believable. I honestly do fear this. Someone could monitor public registers and use GPT to send out thousands of very believable phishing emails without any effort. It’s wonderful to see someone actually attempting to make a difference. I worry, yet do nothing. So thank you.

Good luck in your endeavor.

Here’s a wonderful tutorial to connect W&B to your training data.

1 Like

I still could not figure out the problem here. I balanced my data now. I got more then 500 prompts for each class and always use equal amount for each class to fine-tune the model. The results in the statistics are still perfect. How is it possible that prompts from the training data that are classified as clean return a 100% certain phishing verdict. Can anyone explain how the model work and why is this possible?

I have the same issue – OpenAI’s classification metrics show great validation accuracy (close to 1), but in practice when I manually run the fine-tuned model against the same validation dataset, accuracy is closer to 0.6

I’ve had similar situations, where the stats are good, but everything gets classified as positive.
You might want to try flipping the positive/negative classes. Build the model around “is not phishing” instead of “is phishing”. It likely won’t work better, but it might help understand the problem.