Given a summary of about 1000 tokens, what is the best way to validate if the answer generated from Question Answering are correct.
We ran a test by submitting about 5000 prompts on davinci, requesting an answer for a question relative to the summary. We find that about 78% of the answers coming acceptable while others are either wrong or irrelevant. We use semantics search to filter out the irrelevant answers and still has about 8% of answers are relevant but incorrect.
Any suggestions on what other processing we can do to filter out remaining 8% incorrect answers with human validation?
Hi Luke, thanks for responding. That is correct.
If you will be kind enough to try this example, you will see what I am talking aboutā¦
{
"prompt": "Santa Claus, also known as Father Christmas, Saint Nicholas, Saint Nick, Kris Kringle, or simply Santa, is a legendary character originating in Western Christian culture who is said to bring gifts on Christmas Eve of toys and candy to well-behaved children, and either coal or nothing to naughty children. He is said to accomplish this with the aid of Christmas elves, who make the toys in his workshop at the North Pole, and flying reindeer who pull his sleigh through the air.\nThe modern character of Santa Claus was based on traditions surrounding the historical Saint Nicholas (a fourth-century Greek bishop and gift-giver of Myra), the English figure of Father Christmas, and the Dutch figure of Sinterklaas (also based on Saint Nicholas).\nSanta Claus is generally depicted as a portly, jolly, white-bearded man, often with spectacles, wearing a red coat with white fur collar and cuffs, white-fur-cuffed red trousers, red hat with white fur, and black leather belt and boots, carrying a bag full of gifts for children. He is commonly portrayed as laughing in a way that sounds like \"ho ho ho\". This image became popular in the United States and Canada in the 19th century due to the significant influence of the 1823 poem \"A Visit from St. Nicholas\". Caricaturist and political cartoonist Thomas Nast also played a role in the creation of Santa's image. This image has been maintained and reinforced through song, radio, television, children's books, films, and advertising. \n\nQ: Was Thomas Nast an advertiser?\nA:",
"temperature": 0,
"max_tokens": 100,
"top_p": 0.25,
"frequency_penalty": 0,
"presence_penalty": 0,
"stop": ["\n"]
}
The answer I got back was āYes, he wasā which is wrong.
I think the main problem is that Thomas Nast played a role in creating Santaās image, and the image was used for advertising.
Itās understandable that GPT3 wonāt get every answer correct, but if I can do another fact check and remove incorrect answers, then that will resolve my problem.
Facts are always based on assumptions, and so in a way, youāre getting GPT-3ās version of the facts which is really always going to be an opinion. So in cases like this, itās useful to probe how GPT-3 arrived at that answer. As usual, the argument it uses is quite an interesting one, and Iām willing to buy GPT-3ās version of the facts in this case⦠Hereās what I got at the settings you used when I went on to ask it about the answer:
Q: Was Thomas Nast an advertiser?
A: Yes, he was.
Q: Why was Thomas Nast an advertiser?
A: Because he was a cartoonist.
Q: Are all cartoonists advertisers?
A: No, not all.
Q: Then what makes a cartoonist and advertiser?
A: The desire to make money.
Q: So is every cartoonist that gets paid for their work is an advertiser? Or are they only an advertiser if their work is trying to sell a particular product or service?
A: Itās the latter.
Q: What particular product or service was Thomas Nast trying to sell?
A: Santa Claus.
Q: It sounds like you consider Santa Claus to be product or service? Why is Santa a product or service?
A: Because heās a brand.
Which model are you using? I find davinci instruct is best for these sorts of āclosed domainā Q&A tasks where all of the relevant information required to answer the question is in the prompt. Iāve had some success with this example, although I havenāt tested rigorously: OpenAI API.
Hi Luke,
Thanks for providing the sample for closed domain.
I really like how it reply āI donāt knowā when the probability is not high enough.
Iāll do further testing on this, much appreciated.
Also Open AI can answer a lot of questions based on how creatively the prompt can be engineered, so do go through the prompt engineering topic in the forum