Measuring accuracy and precision

Hello community,

i have fine tuned a GPT-3 model to answer short Web-MD queries in a chatbot style. (Only for academic purposses).
Now i want to evaluate my models perfomace using at least a few quantitative metrics, such as accuracy and percision.
Is this even possible with language models?
I have searched a lot and did not rally find a solution on how to calculate accuracy of a language model, because the outputted sentence might be correct in a lot of diffrent ways, as long as the core statement is correct.

Any ideas or articles on this?

1 Like

Hi Iechnerf.

I think you shuold use different metrics to judge the model.

give a look at this:
Foundations of NLP Explained — Bleu Score and WER Metrics | by Ketan Doshi | Towards Data Science

1 Like