Fine-tune a "BS detector" model

I have been hard at work making fine-tuned models for my AGI project. There are quite a few fine-tunes that I’ll need before revisiting my work but one of the next ones should be interesting, and also have broad applications. This is a “BS detector” meant to detect lies, deception, or otherwise erroneous beliefs. Human brains do this all day every day (though our brains have a bias towards accepting things as true).

Here are some articles to provide context:

It’s important to note that this is a complicated topic and that it goes multiple ways. Humans evolved to thrive in imperfect environments with incomplete information. This means that we have to confabulate a lot of our beliefs about the world and we also have to make judgments based on incomplete information. This is where humans (or our predecessors) evolved to lie and subsequently detect lies. The popular TV Show Lie to Me comes to mind. We want to make a lie-detector.

Here are some examples of what I mean:

Example 1: Fact from Fiction


The planet Zordox has the largest volcano in the universe. The volcano has a circumference of eight parsecs and a volume of four million cubic terahectares.


This seems completely false, likely a work of fiction.

Example 2: Determining truthiness of a statement given relevant information


[Wikipedia article on Thomas Edison]
Assertion: Thomas Edison invented the telegraph in 1876


This is not quit right. Thomas Edison founded Menlo Park in 1876 after the sale of his quadruplex telegraph.

Such a fine-tuned model could have drastic implications for handling and detecting confabulation in chatbots and other GPT-3 applications. It could also help defeat attempts at confusing AGI. From my experiments with Raven, one of the very first things that people instinctively test is the AGI’s ability to handle deception and misdirection. Why? Humans love breaking things! But also, detecting deception is one of the things that makes humans intelligent! Thus we must endow AGI with the same ability.

How you can participate

  1. We need datasets!
  2. We also need datasets!
  3. Lastly, we need datasets!

These can come from pre-existing datasets, or we can create brand new datasets. We can also synthesize new datasets.

Here are my two most recent completed fine-tuning projects for reference.


Just append “Skeptical AI Agent:” in your prompt before every GPT-3 response :joy:

Joking aside, this is awesome. I’m wondering if there are any subreddits that might have relevant training examples.


So far I found this article describing a “fact vs opinion” dataset. This could be a good start.

Then I found this one from Twitter but haven’t delved into it yet.

It occurs to me that Gutenberg may be a good source of data for this. We can have one dataset pulled from, say, news sources and wikipedia (these will be labeled “true” or “factual”) and then we can use some of twitter data, as well as fictional works to be labeled “Opinion” or “fictional”.

Very similar to an idea I was considering, and falls close in line to my multi-decades of studying psychological operations and how they have moved into civilian advertising and social media. Here are some more resources that could be useful. Please keep us apprised on progress and how we can help further. I am very interested to see this effort succeed.

“Bag of Lies” database

Study: “It Takes Two to Lie: One to Lie, and One to Listen”
17,289 messages labeled by sender for intended truthfulness and receiver for perceived truthfulness

Analysis of and link to the Liar dataset

Collection of papers on deception, and the Mafiascum dataset

Covid-19 Lies dataset

“A Multimodal Dataset for Deception Detection”

Dataset of false statements by Trump

Other references:

“An Anatomy of a Lie: Discourse Patterns in Ultimate Deception Dataset”

“A Novel Approach for Lie Detection Based on F-Score and Extreme Learning Machine”

Google searches as a lie dataset

Edit: New user restrictions now lifted, so added the full list of sources.


That’s a lot of datasets! It will take a while to look through them all. I see some of them have tens of thousands of samples, and fortunately we only need about 1000 for fine-tuning GPT-3.

1 Like

I don’t know if it can be of any help, but this topic makes me think of Nassim Taleb which is known for his BS detector :slight_smile: .


I think the BS detector model will figure into memory handling, but it may need to come later as a component of cognitive control. If we store AGI memories in a blockchain, then we don’t need to worry about detecting whether or not they are true or false just yet. We will, however, need to sniff out BS for incoming new data. For instance, if a user makes the assertion that they can sprint at 35 MPH, we don’t want the AGI to just believe that unquestioningly. But still, evaluating new incoming information is pretty advanced and I think we can worry about this a little later.

I think a fundamental philosophical hurdle to consider here is: Who is the definitive authority of truth? “Facts” are a man-made concept used to describe or explain things as they are understood or believed by the source who is sharing the “fact”. All facts come from people, and all people are inherently biased, intentionally or not.
A great classic example is history. It is often said that history is written by the “winners”, meaning that history as we know it is merely a collection of notarized personal anecdotes and hearsay, from their points of view. But what about the “facts” that originated from the “losers” who never got the chance to memorialize them? Or got swept under the rug with the evidence destroyed?
Even science lol. Our understanding of scientific concepts is constantly changing. Once upon a time “the earth is flat” was an indisputable fact.

I realize this is kind of derailing the actual topic, but I think it’s something to consider early on in the process. A BS detector with no omniscience is not going to be a reliable tool. Thorough reviewing of sources can minimize some issues, but you will inevitably encounter “alternative facts” and contradictory/inconclusive information sneaking its way in.

1 Like

This is a non issue and a red herring. Humans have operated with imperfect information for the entirety of our existence, why is the bar of information literacy arbitrarily higher for machines? Like us, the machines will do the best they can with the information that is available.

It might be a red herring, but you’re actually pointing out the exact thing that I am trying to say. Humans have been operating with imperfect information for nearly the entirety of our existence, until recently. Machines have raised that bar, not arbitrarily but by their fundamental design. We expect and accept a certain margin of error when it comes to humans. Machines do not have this margin of error. They can break, or glitch, but they do not make mistakes the way humans make mistakes. You give a machine some computations, it will always give you the same exact correct answer. It might break someday somehow, but when it does it will be obvious. We hold machines to that standard of perfect informational accuracy. If we try to introduce them to our methods of doing things, we may underutilize the key advantages that they provide us with. Just a thought.

Just bumping this thread to see if anything has come out of it? Sounds very interesting!