Yes! Check out my idea. We have started building a list of datasets. Letâs collaborate!
2 Likes
asabet
3
How are you determining your source of âtruthâ for the app?
7 Likes
great question! I have been thinking about two approaches: information about the hosting, domain, security certificates, protocols, etc. of the site where the client will consult. It would use this data provided by the browser to filter out potential âdirtyâ sources of information. but actually Iâve been considering a second approach as well: machine learning about hundreds or maybe thousands of fake site urls to create clusters.
what do you think @asabet ?
Iâve already visited your topic and I was frighteningly impressed, haha!
I would love to collaborate with you and this is what you are bringing to the community.
my idea is very much associated with the current problem I live in my country, and it could even be a kind of MVP for something bigger like a lie detector. What do you think?
1 Like
I think that would be great. I personally would love to have something where you just grab a link for a news article and paste it into a service (or an app) and it uses NLP/GPT-3 to parse the claims in the article, check for accuracy, and summarize. Because what do I do if I come across something that I want to verify? I just google for similar articles to get the cross-check and cross-validation. But not everyone does that.
Perhaps the biggest problem is that some people want to stay in their echo chambers. So how do you use GPT-3 to expose people to information and ideas that they find objectionable? Does their free will figure into it? These are bigger questions that may or may not figure into an MVP.
1 Like
asabet
7
First issue I see is that classifying the entire text from a labelled misinformation site may not be entirely accurate, as news from most sources have elements of truth and falsehood in them (ie bias). If youâre training a classifier on sources of misinformation, it might give better results to have misinformation labelling at a more granular level (ie sentences and paragraphs), otherwise you might have high false positives due to unrelated factors like writing style.
There are also cases where obtaining unbiased expert insight is difficult, ie for evaluating medical misinformation, and the communityâs consensus on what is âtrueâ can change over time. It might help to create a transparent+community-driven database for tracking and discussing misinformation sources, then have a review process for the sources of text that are used to train a potential misinformation classifier. If you can figure out a process that can reliably produce âground-truthâ, training a classifier would be much easier, in my opinion.
2 Likes
The problem is that youâre thinking of this as a traditional NLP task e.g. classification. To use GPT-3 for that task would just be silly. What weâre talking about is like automatically generating SNOPES articles.
1 Like
asabet
9
? Perhaps you need me to clarify my post? First paragraph I point out that classification of an entire text would be difficult (âsillyâ in your words), but give concrete reasons as to why. However, if you break down a text into statements (ie sentences), then you it may be possible to accurately evaluate the statement with a finetuned classifier (or a classifier paired with a search model). Since language models can act as knowledge bases (Petroni et al 2019), then a finetuned model can plausibly classify individual statements as misinformation, if itâs regularly trained on an accurate ground-truth. Itâs unproductive to make unqualified assertions about a specific method, as it depends on which assumptions you make for your task, and actual experimental results.
Second paragraph I point out that, regardless of your approach, constructing an accurate wiki-style database is the more important task. The success of whatever you do downstream (ie like training a classifier, or snopes generator) is most dependent on the ground-truth dataset itself.
Regardless, Iâm directly addressing @marcelxâs questions about âcheckingâ and âfilteringâ sources, which isnât mutually exclusive from whatever youâre saying
.
I disagree with a lot of your assumptions, but itâs okay to disagree.
marcelx
11
friends @asabet and @daveshapautomator , regardless of the direction of this idea, I think our discussion is going in the right direction, because the subject is indeed thorny; it would be easier to deal with AI to solve sales, marketing, or even grammar rules. developing and training a machine to deal with lies is also doing the same for the truth, and this is a paradox that, from my point of view, few still have the courage to discuss how we are doing here, so I am grateful for that.
now about the body of the project: I like the suggestion of having a database and text analysis at granular levels, but I think the complexity of this needs to be treated very seriously to avoid creating a failed fake news machine. I still think âattackingâ the SOURCES is the best way to find the white rabbit.
My approach: A list (DB) of thousands of âPink Slimeâ sites used for machine training can generate really interesting output. So, to avoid creating something deterministic about truth and lies, I would use percentages for the end user. All the parameters (SSL, protocols, manual delation and more data) Something like âThe Bot learned there is an 89% chance this font is fakeâ rather than something binary truth or lie.
can you help me to see any flaw in this idea?
2 Likes
marcelx
12
here are some interesting things about user view for this source check. this technical part would be done by the machine (GPT-3) over and over again and again until check source better than any human /journal/fact check site
Get Technical
Different formats of media come with their own conventions. Most fake news that you will encounter disguises itself as legitimate online news, so knowing the conventions of legitimate online news sources will help you understand how fake news differs.
-
Check the domain name.
- Legitimate news sources usually have a professional domain name that matches the name of their organization. For instance, the website for CBC news is http://www.cbc.ca/news. Fake news URLs are less likely to be professional in nature or identifiable as a distinct news organization.
- Identify the top-level domain of a URL, as this will tell you the country where the site is hosted (eg. .ca, .au) or the purpose of the site (.edu, .com). Fake news sites sometimes use URLs that mimic legitimate sites but use a different top-level domain: for instance, http://www.cbc.ca.co/news.
- Site names ending in âlo,â such as Newslo, are also conventionally fake.
-
Check for an About Us page, a Contact Us page, or other information pages. All legitimate news sites have pages like this, although the names may differ.
-
Check the links. Broken links happen to the best of us, including legitimate news sources. However, most links on a news article should work, and these links should take the reader to other, legitimate sources.
-
Have a look at the web design. Examples of poor web design include sites with too many colours or fonts, poor use of white space, and numerous animated gifs. Good web design is a sign of credibility, and legitimate news sources will prioritize having a proper website. A news organization like the CBC can afford to hire a web designer; they cannot afford to have a site that is unpleasant to visit. This is not to say that all sites with good web design are legitimate.
-
Learn to recognize paid content. Many legitimate news sources include advertising on their site, often in the form of native advertising that blends in with regular articles. Paid advertising like does not meet the standards of true journalism. Some examples of native advertising are available on this Milton Academy Library Guide.
-
Check who owns the domain. If youâre curious about who owns a website, try looking it up on
https://www.whois.net/. For instance, a search for cbc.ca will show that it is owned by the Canadian Broadcasting Company.
-
Install a browser extension to warn you when you are visiting a fake news site, such as the Fake News Alert for Chrome.
-
Research the images. If an image used in a news articles looks suspicious to you, try using TinEye or Google reverse image search to find out if the image has previously been used elsewhere. If it has, check if it has since been edited. If the image is legitimate, searching for other images of the same scene might provide you with more context.
source: Identifying Fake News - Fake News - Research Guides at Ryerson University Library
more here: Identifying Fake News Sources - Evaluating Websites - MaxGuides at Bridgewater State University
mike1
13
This is a great conversation. On point! Ten years ago or so, I wanted to build a similar system. Yet much more rudimentary.
As you know, at some point we have to surrender our knowledge to the folks that know way more than us as a collective in a particular subject. Take Science for an example, we/they love to prove others wrong via confirming observations or theory. Thatâs how we get our best answers whether right or wrong, and move forward. Reality always reveals itself eventually.
Here was the idea, Say youâre using Chrome or FF and youâre reading an article, a post, a research paper, whatever, and you want to verify something written as truth, you would highlight the copy, (through a plugin/add on) right click choose a new context item saying âverify contentâ which calls a service with the logic to confirm the accuracy of any postulate by sourcing related context from objective sources (like you mention in Get Technical). Thatâs straight forward work.
YET,
How do you promote trust in your system to be developed, when the tribalism has gotten so fierce? Wrong or not, trusting the collective objectivity of a statement - trust as best as humans can provide - somehow has to be learned by the folks who will not even communicate with people who do not deliver info to reassure their confirmation bias. Then weâre back to square one. Thatâs the challenge here. And I donât have the brain power to solve it.
Think of self driving cars, the tech is so simple and very safe, yet how do we get everyone to drive that car, and get them to take their hands off the wheel? Itâs not gonna be easy. But it has to happen eventually.
@marcelx, good luck - itâs brilliant work that can change the world, literally. Just keep pushing, the âVIXâ on false information in growing exponentially every day.
My nonsense aside, I did design a logo and did some concept work on mascots back then. Iâll attach for a laugh.
2 Likes
Wow! Really informative video. Thanks for sharing!
1 Like
aakash
16
@marcelx @asabet and @daveshapautomator
This is a really cool project. Happy to throw my hat into it. I hve built a telegram bot for the same purpose. itâs pretty rudimentary in that sense, but do check & tell me what you think 
and hereâs a hackernoon article talking about its architecture:
1 Like
I think the scope of its usefulness would be limited to low-hanging fruit like spam posts. Here in the US our own government officials lie to us regularly, and when they are caught, they just ignore the subject and redirect to something else, or come up with logically invalid defenses (Those highly confidential documents with alarming information that accidentally get leaked? Uhh we cannot comment on it as itâs confidential and part of an ongoing investigation) until people stop talking about it. Your proposed AI would probably be automatically ranking these as top level sources. In some cases this may be even more damage than benefit, as there are already plenty of folks that accept statements from the government as gospel. The last thing they need is an independent fact checker that reassures them that the government would never lie. 
Being skeptical and doing your own thorough research is necessary if you wish to be well informed on a topic. Think about it for a moment. Being reliant on 3rd party AIâs to provide you with what it deems to be the most appropriate content for you. Itâs almost as if⌠they WANT you to look no further and see nothing else.
1 Like
I like the idea but I think youâre looking at the problem from the wrong perspective. The news is one thing and the people consuming it is another.
For example if a well known fitness expert says that a certain protein is good and another protein sucks. How do we confirm that his/her claim is true or false?
Is there science literature that we can look at? is their studies done on this subject?
The average person wonât bother to look more into it, theyâll just take the fitness experts word for it. Why? because it causes friction to have to search, read and interpret the information. Itâs much easier for a so called âexpertâ to tell us.
So the idea isnât to just spot where the fake news is but to convince the general public that your solution is trustworthy and that they should check in with your solution and see if that news is fake or not.
Think kelly blue book, a site that tells you the value of the car youâre trying to sell, as well as the car your trying to buy.
Before KBB you had to search and see what the value of the car was by cross referencing different sites, dealers etc. and then do the calculation to see what the average price is in order to determine if you were selling or buying at the right price.
Now KBB is trusted even by dealerships because if KBB saids that my car is worth $5000 and youâre offering me $3000, I know for a fact Iâm being lowballed.
Now letâs use the news in this context. The president of Brazil saids that covid cases our going down but the data from hospitals says otherwise.
I see the news and might think âare cases really going down?â I go and check your web app that tells me that thereâs an 80% chance itâs not true and thereâs cited sources that I can see for myself.
So the aim is to create trust with your solution because youâre asking the general public to trust your solution more than the news and the government.
2 Likes
This is the key point here. In the information age, finding reliable information, finding discussions, and determining the trustworthiness of a source are easy AF. As you pointed out, people just donât do it. Instead, we have people self-selecting into echo-chambers where everyone looks for confirmation bias.
No one who wants to believe that COVID is a hoax is going to check reliable news sources. They are only going to go straight to their preferred propaganda station.
Whatâs really needed is a more psychological approach to the problem. This is called âinfodemiologyâ so you might want to explore that @marcelx. Using something like GPT-3 to not just verify a single source, but to track down where the information is traveling to and from.
1 Like
I cross-check everything and to be honest I donât like how much time it consumes.
So I had an idea a while back before I discovered GPT-3, of crawling through the first page of google, checking for duplicate information (since a lot of sites just copy and paste their info from another source) and then summarizing it.
That way I didnât have to go through most of the search results in order to cross reference.
Now that I see you mention that you would like having something where you can use a link to parse the claims, check for accuracy and summarize.
A chrome extension makes sense, it can do those checks for you when you land on the article. The honey chrome extension came to mind where you go a site that has something for sale and honey checks if thereâs any discount codes you can use.
In this case the extension would crawl the site, check for accuracy and provide a summary.
[quote] So how do you use GPT-3 to expose people to information and ideas that they find objectionable? Does their free will figure into it? These are bigger questions that may or may not figure into an MVP.
[/quote]
This is a great question, but I wouldnât bother to expose it to them. I would rather keep people informed with the truth and let that spread. Eventually theyâll get exposed to it and have no choice but to question their beliefs.
People challenged the claim that the earth was round and then humans verified that it was indeed round, so when people came out again claiming the world was flat, they looked stupid because the truth had spread far enough.
1 Like
I would add not just where the information is travelling to and from but also who the intended target is. The information is always aimed at a certain group so classifying those groups and tying them to the information would be helpful.
1 Like