I am building an agent that can answer questions about a corpus of data (using embeddings). However, I NEED a framework to test my system… I would like to know how often my agent lies and hallucinates and I would like to build in safeguards to catch these cases. Is there any known way to do this? My current idea is to have experts write a few hundred tricky and intentionally misleading questions? How does open ai approach this problem? Even if there are no evaluation datasets out there is there a framework for creating the test questions that probes the weak spots of the model? Note that if there is a dataset that includes a large amount of data with questions about it but is not necesarily on my dataset obv that would be sufficient.
you seek the truth in one dataset with another dataset you probably want a dataset full of true and false questions in which questions with known single state answers are fed to a model and you look for a true or false responses and grade its results. I would look into collecting university tests for subjects related to your data and collect all the true and false questions that you can.
Great question @fredc.
Although I’m not familiar with a canned option out there. And besides the obvious “monitor the output” …
One thing you could try is embed the “true answer”, say “Y” to the question “X”
And then monitor when question “X” is asked/answered, does the answer embedding “Y-prime” get close the true “Y” answer.
Since this is totally automated, this could be used as a hallucination detector, especially for critical questions that could be asked of your system.