Performance Metrics and Truthfulness

I am building an agent that can answer questions about a corpus of data (using embeddings). However, I NEED a framework to test my system… I would like to know how often my agent lies and hallucinates and I would like to build in safeguards to catch these cases. Is there any known way to do this? My current idea is to have experts write a few hundred tricky and intentionally misleading questions? How does open ai approach this problem? Even if there are no evaluation datasets out there is there a framework for creating the test questions that probes the weak spots of the model? Note that if there is a dataset that includes a large amount of data with questions about it but is not necesarily on my dataset obv that would be sufficient.

1 Like

you seek the truth in one dataset with another dataset you probably want a dataset full of true and false questions in which questions with known single state answers are fed to a model and you look for a true or false responses and grade its results. I would look into collecting university tests for subjects related to your data and collect all the true and false questions that you can.

1 Like

Great question @fredc.

Although I’m not familiar with a canned option out there. And besides the obvious “monitor the output” …

One thing you could try is embed the “true answer”, say “Y” to the question “X”

And then monitor when question “X” is asked/answered, does the answer embedding “Y-prime” get close the true “Y” answer.

Since this is totally automated, this could be used as a hallucination detector, especially for critical questions that could be asked of your system.