Measuring accuracy and precision

lechnerf · March 15, 2022, 12:31pm

Hello community,

i have fine tuned a GPT-3 model to answer short Web-MD queries in a chatbot style. (Only for academic purposses).
Now i want to evaluate my models perfomace using at least a few quantitative metrics, such as accuracy and percision.
Is this even possible with language models?
I have searched a lot and did not rally find a solution on how to calculate accuracy of a language model, because the outputted sentence might be correct in a lot of diffrent ways, as long as the core statement is correct.

Any ideas or articles on this?

patrizio.bellan · March 23, 2022, 12:22pm

Hi Iechnerf.

I think you shuold use different metrics to judge the model.

give a look at this:
Foundations of NLP Explained — Bleu Score and WER Metrics | by Ketan Doshi | Towards Data Science

Topic		Replies	Views
Performance matrices of the finetuned model Community gpt-35-turbo , fine-tuning , api , assistants-api	0	411	March 18, 2024
What is the best metrics to calculate how correctly the llm is giving answer API	0	906	May 15, 2023
Evaluating the performance of a fine-tuned dialogue system API gpt-35-turbo , fine-tuning-problems	0	550	November 5, 2023
Calculating the Confidence Scrore for the Responses to the Prompts in case of Text 2 SQL application Community gpt-4 , plugin-development	0	286	July 4, 2024
Best practices for evaluating OpenAI models for smart search (research approach) Community	1	116	January 21, 2026

Measuring accuracy and precision

Related topics