Wiki source for FactcheckingData
@@=====**Factchecking**=====@@
[[http://wiki.votesmart.org/SpeechesGuide <-- Back to the Speeches Guide]]
[[FactCheckPoliciesProcedures Fact Checking Policies and Procedures]]
===Introduction===
Factchecking data is an additional layer of context we add to our Public Statements, meant to measure a politician's honesty. It's also central to the BULL project, which will display the number of lies a candidate has in the database on votesmart.org.
External factcheckers produce factchecks (reports) which analyze politician(s) statements and provide a ruling as to their truthfulness. Vote Smart staff enhance this data and connect these factchecks to Vote Smart content. Those connections to factchecks then appear in BULL.
Fact-check sweeps should be conducted once a month. If a statement that has been fact-checked is not yet in our database, but it something that we can take (a tweet we haven't taken yet, or an op-ed or interview in a publication, for example), the researcher conducting the fact-check sweep should enter that statement into the database and then associate the fact-checker article to that statement. (This may lead to duplicate statements, but those will be caught in weekly quality control checks. Entering the fact-check is more important than potentially missing good data.) Information on how to take a statement is available on the [[FieldBoxes Speeches Field Boxes in Admin]] wiki.
Guidelines for talking to the public about Bull are available on the [[BULLTalkingPoints BULL Hotline Guidelines]] wiki.
====Factchecking Data Standards====
**Scope of Coverage**
Statements: all those statements for which a public record can be found, and otherwise meets our criteria for speech collection (including the offices and jurisdictions covered). If the statement does meet our criteria but is not already in our public statements database, these statements should be added.
Factchecks (fact-checking reports): if a report analyzes a politician's honesty in a statement meeting our collection criteria, and finds the statement to be false or mostly false, the report will be included. Therefore we would exclude:
- reports relating to a statement that doesn't meet our criteria for our inclusion
- reports that solely analyze the consistency of a candidate's position ("flip-flopping"), campaign promise fulfillment, or anything other than the truthfulness of a statement
- reports finding the statement to be true, mostly true, or half-true-- we're focusing on the lies that politicians tell
Core content covered per factcheck: speech_id, factchecker, URL of factchecking report, ruling
**Data Sources**
Factchecks are sourced from external data partners ("factcheckers"). As of March 2019 this includes the three most-prominent, independent, nonpartisan groups: [[FactCheck.org]], [[Politifact Politifact]], and [[The Washington Post]].
These factchecks are then associated with Project Vote Smart's Public Statements database.
The association of Project Vote Smart's public statements with factchecks are made by Project Vote Smart staff.
Politifact and Washington Post rulings are sourced from their respective sites and are not modified. Rulings for Factcheck.org factchecks are assessed individually by Vote Smart staff, based on Factcheck.org's factcheck reports.
Vote Smart staff also flag statements as "questionable" based on rulings from Politifact, Factcheck.org, and the Washington Post's factchecks and in accordance with its internal criteria for rulings.
**Criteria for Rulings**
A "ruling" is a standardized summary-judgment of a politician's statement attributed to a factchecker.
(Note: The numbers reflected below are NOT the same as factcheckruling_id in our database; they were previously using in spreadsheets and are only provided here as a way to understand those spreadsheets)
Project Vote Smart's rulings of Factcheck.org's factchecks:
1 = entirely false
2 = mostly false + some context
3 = 50/50 true/false + lots of additional context
4 = mostly true + some context
5 = entirely true
The following may be seen in previous work, though it does not meet our current criteria for inclusion:
11 = entirely inconsistent or full flip flop
13 = half flip flop
15 = entirely consistent or no flip flop
Politifact's [[https://www.politifact.com/truth-o-meter/article/2018/feb/12/principles-truth-o-meter-politifacts-methodology-i/ rulings]]
The Washington Post Fact Checker's [[https://www.washingtonpost.com/politics/2019/01/07/about-fact-checker/?utm_term=.de159f0b5e8c rulings]] (found under the Pinocchio Test heading)
Criteria for flagging a statement as "questionable" (this is done through logic and does not involve subjective consideration)
- Factcheck ruling (from Vote Smart) of:
-entirely false
-mostly false + some context
-50/50 true/false + lots of additional context
- Politifact [[https://www.politifact.com/truth-o-meter/article/2018/feb/12/principles-truth-o-meter-politifacts-methodology-i/ "statementrulingid"]] of 3, 4, 5, or 6
- The Washington Post Fact Checker's [[https://www.washingtonpost.com/politics/2019/01/07/about-fact-checker/?utm_term=.de159f0b5e8c "statementrulingid"]] of Three Pinocchios, Four Pinocchios, or Bottomless Pinocchio
**Expected Frequency of Updates**
Weekly when all schedules up to date, monthly if necessary
====Known Issues====
**Data**
- there may be multiple factchecks per speech, possibly from different factcheckers (this also impacts how we might represent and aggregate those rulings, which may vary).
- multiple speeches may be evaluated in one factcheck, and may have different rulings
- currently, factchecks are only related to speech_id's and finer grain associations may be needed- rulings are typically based on a subset of a speech and not the whole speech. Sometimes these quotes might be repeated in multiple statements. This subset may only a subset of the categories or tags of the speech (for example, a debate transcript or State of the Union). In the case of a debate, there would be multiple speakers in one piece of raw evidence, but the factchecking report may only address the comments of one speaker. We have previously discussed using "speech snippets" as a solution but never implemented it. Note: in the first round of dealing with historical data, we did save additional, finer-grain detail, including: categorization of the factcheck report, tagging of the factcheck report, and matching to candidate Ids- this may be useful in the future.
- Currently we are only incorporating factchecks that can be associated with our public statements database. This leaves out factchecks associated with bios or other data types we cover (a rare occurrence, but some examples: [[http://www.politifact.com/new-jersey/article/2013/apr/21/chris-christie-claims-scooby-doo-was-his-favorite-/ 1]], [[http://www.politifact.com/oregon/statements/2012/oct/07/laurie-monnes-anderson/laurie-monnes-anderson-registered-nurse/ 2]], [[http://www.factcheck.org/2008/08/born-in-the-usa/ 3]]), factchecks for candidates we normally cover but don't cover speeches for, and speeches that don't meet our criteria for inclusion (like speeches without a direct quote from a candidate. Should we add a way to connect factchecks with bios or other data types? Should we start including non-traditional evidence, like campaign ads or mailers from the candidate's campaign, or social media? Should we change our speech collection criteria, at least for those with associated factchecks? If the raw evidence is no longer publicly available from any source, should we just add the quote being referenced? If not directly tied to a speech or piece of evidence in our database, should we simply add the factcheck and add a way to reference the raw evidence?
- Should we extract Factcheck.org's rulings and keep these raw rulings in our database as well? Factcheck doesn't use a set system but summary judgments could be extracted, like “whopper”
- Should Factcheckers be added as SIGs and connected that way? Arguments for (Kristen, National Director): There are some current SIGs that use candidate statements (often candidate surveys) to evaluate that politician's alignment with them on the issues, but we don't track that, and it's a little different. Also, I have been trying to get factcheck and politifact's evaluations into an aggregate score for use as a rating. If so, can we add a SIG type to distinguish media/journalism groups from special interests? In this scenario, rulings made by Factcheckers could be “SIG Ratings”, with the standardized rulings made by PVS as “Our Ratings.” We may ultimately want to evaluate piece of evidence for a candidate(s) on other dimensions and from other groups- for example, SIG's preferred bill positions (which may also tie into their aggregate ratings); Aggregate of judgments weighed upon a candidate by a SIG should translate into an overall rating. Arguments against (Clinton, Former IT Director): I'm not seeing a reason to treat fact checking orgs as sigs unless there's more of an overlap in what is rated.
- Other associations we might make with our data: Some factchecks refer to statements about issues, about their voting records, about a bill, about their bio, about other candidates, etc. I could see value in the statement being evaluated connected to the aspect of their record they are talking about, but not necessarily the factchecker judgment being directly connected to that aspect of their record. It should also be noted that Vote Smart is sometimes cited as a source for the factcheckers' analysis. I'm not sure there's anything to do with that information.
**Representation of factchecks**
- define “factchecks” better- specify in some way the dimension being evaluated (in this case, honesty vs. misrepresentation or the factual nature of a statement)
- We have established criteria for “questionable” but what do we do with the rest of our statements? Should we assume truth if not otherwise verified by a Factchecker? For those statements deemed to be true, should we represent that they have been “certified true” or “verified” much like VoteEasy research certified answers?
**Corrections**
- A fact-checker may issue a correction for an earlier article. It's unlikely that a change in a ruling will warrant a new article, making it difficult for us to catch these changes. This can be avoided by setting up Google alerts for the individual fact-checkers. For example, setting up an alert for "FactCheck.org correction". So far, these alerts are pulling in a wide variety of unrelated content, but a researcher monitoring these alerts can easily disregard irrelevant articles.
====Key Tasks====
The key task is to associate the following content:
- the evidence being evaluated (Vote Smart's public statements)
- the ruling of the evidence made by a fact checker (see: data sources)
- the factcheck report (analysis) of the evidence by the factchecker
In the past, Clinton (former IT Director) was able to [[http://mantis.votesmart.org/view.php?id=4976 scrape factcheck and use Politifact's API to provide CSVs]]. Among other data, he included the factcheck URLs, which staff would then match to speech_ids and resubmit. The plan at that time was to provide these CSVs on a monthly basis. With an IT backlog, Research staff began manually retrieving URLs from Politifact and Factcheck's respective websites.
To associate factchecking data to statements:
1) evaluate if the factcheck report meets our criteria for inclusion (in our experience, this has been about 50%). If it doesn't fit into our normal criteria for inclusion or is a new kind of evidence, and you think it ought to be added, run it by the National Director.
1) Find the quote being evaluated in Vote Smart's Public Statements database. Add the public statement if necessary.
1) Relate the public statement to the factcheck in accordance with current procedure
1) Assign a ruling if not provided clearly by the fact checker (note: this is currently done in bulk for Politifact and individually for Factcheck.org.)
Special situations:
- a factcheck evaluating a statement by Candidate A about something Candidate B said should not be related to Candidate B but should be related to Candidate A
====Management of Factchecking Updates====
**Key Objectives:**
Get our data as up-to-date as possible
Work with IT on the following:
1) improve process so that updates of this content can be done more frequently
1) address "Known Issues" as needed
1) add way to input this data to Admin- this may include: adding word search capabilities for public statements; ability to associate factchecks with candidate speeches; separate section for factcheck entries including the ability to browse existing factchecks
1) integrate Factcheck.org's API
1) integrate data into our other web properties
1) integrate data into our public API
1) future development (See "Incorporating Fact-Checking Data" and "factchecking data to include" documents on the public drive->cross-department projects->possible future projects)
**Pace Estimates (for associating factchecks to public statements):**
beginners: recorded at 16-28 factcheck articles/hour using spreadsheet imports (sample size of 2 staff members; this includes approximately 50% of those articles marked "intentionally blank" because they were not factchecking federal active officials)
====API====
As of early 2014, the following content was fed through the public statements call of Version 2 of the API. It is Kristen's understanding that "ruling" was replaced with a flag of "questionable" (this should be verified):
Example snippet :
<<"factchecks": [
{
"factchecker": "PolitiFact.com",
"link":
"http://www.factcheck.org/2012/10/whoppers-of-2012-final-edition/",
"ruling": "entirely false"
},
{
"factchecker": "PolitiFact.com",
"link":
"http://www.factcheck.org/2012/10/dubious-denver-debate-declarations/",
"ruling": "mostly false + some context"
},<<
Our intention is to highlight the statements made by a candidate that were determined to be questionable or some degree of false. So, claims that were determined to be true would be excluded from the current display.
===Future Applications:===
Using our tagging system to identify additional speeches that a fact check could apply to.
CategoryResearch
[[http://wiki.votesmart.org/SpeechesGuide <-- Back to the Speeches Guide]]
[[FactCheckPoliciesProcedures Fact Checking Policies and Procedures]]
===Introduction===
Factchecking data is an additional layer of context we add to our Public Statements, meant to measure a politician's honesty. It's also central to the BULL project, which will display the number of lies a candidate has in the database on votesmart.org.
External factcheckers produce factchecks (reports) which analyze politician(s) statements and provide a ruling as to their truthfulness. Vote Smart staff enhance this data and connect these factchecks to Vote Smart content. Those connections to factchecks then appear in BULL.
Fact-check sweeps should be conducted once a month. If a statement that has been fact-checked is not yet in our database, but it something that we can take (a tweet we haven't taken yet, or an op-ed or interview in a publication, for example), the researcher conducting the fact-check sweep should enter that statement into the database and then associate the fact-checker article to that statement. (This may lead to duplicate statements, but those will be caught in weekly quality control checks. Entering the fact-check is more important than potentially missing good data.) Information on how to take a statement is available on the [[FieldBoxes Speeches Field Boxes in Admin]] wiki.
Guidelines for talking to the public about Bull are available on the [[BULLTalkingPoints BULL Hotline Guidelines]] wiki.
====Factchecking Data Standards====
**Scope of Coverage**
Statements: all those statements for which a public record can be found, and otherwise meets our criteria for speech collection (including the offices and jurisdictions covered). If the statement does meet our criteria but is not already in our public statements database, these statements should be added.
Factchecks (fact-checking reports): if a report analyzes a politician's honesty in a statement meeting our collection criteria, and finds the statement to be false or mostly false, the report will be included. Therefore we would exclude:
- reports relating to a statement that doesn't meet our criteria for our inclusion
- reports that solely analyze the consistency of a candidate's position ("flip-flopping"), campaign promise fulfillment, or anything other than the truthfulness of a statement
- reports finding the statement to be true, mostly true, or half-true-- we're focusing on the lies that politicians tell
Core content covered per factcheck: speech_id, factchecker, URL of factchecking report, ruling
**Data Sources**
Factchecks are sourced from external data partners ("factcheckers"). As of March 2019 this includes the three most-prominent, independent, nonpartisan groups: [[FactCheck.org]], [[Politifact Politifact]], and [[The Washington Post]].
These factchecks are then associated with Project Vote Smart's Public Statements database.
The association of Project Vote Smart's public statements with factchecks are made by Project Vote Smart staff.
Politifact and Washington Post rulings are sourced from their respective sites and are not modified. Rulings for Factcheck.org factchecks are assessed individually by Vote Smart staff, based on Factcheck.org's factcheck reports.
Vote Smart staff also flag statements as "questionable" based on rulings from Politifact, Factcheck.org, and the Washington Post's factchecks and in accordance with its internal criteria for rulings.
**Criteria for Rulings**
A "ruling" is a standardized summary-judgment of a politician's statement attributed to a factchecker.
(Note: The numbers reflected below are NOT the same as factcheckruling_id in our database; they were previously using in spreadsheets and are only provided here as a way to understand those spreadsheets)
Project Vote Smart's rulings of Factcheck.org's factchecks:
1 = entirely false
2 = mostly false + some context
3 = 50/50 true/false + lots of additional context
4 = mostly true + some context
5 = entirely true
The following may be seen in previous work, though it does not meet our current criteria for inclusion:
11 = entirely inconsistent or full flip flop
13 = half flip flop
15 = entirely consistent or no flip flop
Politifact's [[https://www.politifact.com/truth-o-meter/article/2018/feb/12/principles-truth-o-meter-politifacts-methodology-i/ rulings]]
The Washington Post Fact Checker's [[https://www.washingtonpost.com/politics/2019/01/07/about-fact-checker/?utm_term=.de159f0b5e8c rulings]] (found under the Pinocchio Test heading)
Criteria for flagging a statement as "questionable" (this is done through logic and does not involve subjective consideration)
- Factcheck ruling (from Vote Smart) of:
-entirely false
-mostly false + some context
-50/50 true/false + lots of additional context
- Politifact [[https://www.politifact.com/truth-o-meter/article/2018/feb/12/principles-truth-o-meter-politifacts-methodology-i/ "statementrulingid"]] of 3, 4, 5, or 6
- The Washington Post Fact Checker's [[https://www.washingtonpost.com/politics/2019/01/07/about-fact-checker/?utm_term=.de159f0b5e8c "statementrulingid"]] of Three Pinocchios, Four Pinocchios, or Bottomless Pinocchio
**Expected Frequency of Updates**
Weekly when all schedules up to date, monthly if necessary
====Known Issues====
**Data**
- there may be multiple factchecks per speech, possibly from different factcheckers (this also impacts how we might represent and aggregate those rulings, which may vary).
- multiple speeches may be evaluated in one factcheck, and may have different rulings
- currently, factchecks are only related to speech_id's and finer grain associations may be needed- rulings are typically based on a subset of a speech and not the whole speech. Sometimes these quotes might be repeated in multiple statements. This subset may only a subset of the categories or tags of the speech (for example, a debate transcript or State of the Union). In the case of a debate, there would be multiple speakers in one piece of raw evidence, but the factchecking report may only address the comments of one speaker. We have previously discussed using "speech snippets" as a solution but never implemented it. Note: in the first round of dealing with historical data, we did save additional, finer-grain detail, including: categorization of the factcheck report, tagging of the factcheck report, and matching to candidate Ids- this may be useful in the future.
- Currently we are only incorporating factchecks that can be associated with our public statements database. This leaves out factchecks associated with bios or other data types we cover (a rare occurrence, but some examples: [[http://www.politifact.com/new-jersey/article/2013/apr/21/chris-christie-claims-scooby-doo-was-his-favorite-/ 1]], [[http://www.politifact.com/oregon/statements/2012/oct/07/laurie-monnes-anderson/laurie-monnes-anderson-registered-nurse/ 2]], [[http://www.factcheck.org/2008/08/born-in-the-usa/ 3]]), factchecks for candidates we normally cover but don't cover speeches for, and speeches that don't meet our criteria for inclusion (like speeches without a direct quote from a candidate. Should we add a way to connect factchecks with bios or other data types? Should we start including non-traditional evidence, like campaign ads or mailers from the candidate's campaign, or social media? Should we change our speech collection criteria, at least for those with associated factchecks? If the raw evidence is no longer publicly available from any source, should we just add the quote being referenced? If not directly tied to a speech or piece of evidence in our database, should we simply add the factcheck and add a way to reference the raw evidence?
- Should we extract Factcheck.org's rulings and keep these raw rulings in our database as well? Factcheck doesn't use a set system but summary judgments could be extracted, like “whopper”
- Should Factcheckers be added as SIGs and connected that way? Arguments for (Kristen, National Director): There are some current SIGs that use candidate statements (often candidate surveys) to evaluate that politician's alignment with them on the issues, but we don't track that, and it's a little different. Also, I have been trying to get factcheck and politifact's evaluations into an aggregate score for use as a rating. If so, can we add a SIG type to distinguish media/journalism groups from special interests? In this scenario, rulings made by Factcheckers could be “SIG Ratings”, with the standardized rulings made by PVS as “Our Ratings.” We may ultimately want to evaluate piece of evidence for a candidate(s) on other dimensions and from other groups- for example, SIG's preferred bill positions (which may also tie into their aggregate ratings); Aggregate of judgments weighed upon a candidate by a SIG should translate into an overall rating. Arguments against (Clinton, Former IT Director): I'm not seeing a reason to treat fact checking orgs as sigs unless there's more of an overlap in what is rated.
- Other associations we might make with our data: Some factchecks refer to statements about issues, about their voting records, about a bill, about their bio, about other candidates, etc. I could see value in the statement being evaluated connected to the aspect of their record they are talking about, but not necessarily the factchecker judgment being directly connected to that aspect of their record. It should also be noted that Vote Smart is sometimes cited as a source for the factcheckers' analysis. I'm not sure there's anything to do with that information.
**Representation of factchecks**
- define “factchecks” better- specify in some way the dimension being evaluated (in this case, honesty vs. misrepresentation or the factual nature of a statement)
- We have established criteria for “questionable” but what do we do with the rest of our statements? Should we assume truth if not otherwise verified by a Factchecker? For those statements deemed to be true, should we represent that they have been “certified true” or “verified” much like VoteEasy research certified answers?
**Corrections**
- A fact-checker may issue a correction for an earlier article. It's unlikely that a change in a ruling will warrant a new article, making it difficult for us to catch these changes. This can be avoided by setting up Google alerts for the individual fact-checkers. For example, setting up an alert for "FactCheck.org correction". So far, these alerts are pulling in a wide variety of unrelated content, but a researcher monitoring these alerts can easily disregard irrelevant articles.
====Key Tasks====
The key task is to associate the following content:
- the evidence being evaluated (Vote Smart's public statements)
- the ruling of the evidence made by a fact checker (see: data sources)
- the factcheck report (analysis) of the evidence by the factchecker
In the past, Clinton (former IT Director) was able to [[http://mantis.votesmart.org/view.php?id=4976 scrape factcheck and use Politifact's API to provide CSVs]]. Among other data, he included the factcheck URLs, which staff would then match to speech_ids and resubmit. The plan at that time was to provide these CSVs on a monthly basis. With an IT backlog, Research staff began manually retrieving URLs from Politifact and Factcheck's respective websites.
To associate factchecking data to statements:
1) evaluate if the factcheck report meets our criteria for inclusion (in our experience, this has been about 50%). If it doesn't fit into our normal criteria for inclusion or is a new kind of evidence, and you think it ought to be added, run it by the National Director.
1) Find the quote being evaluated in Vote Smart's Public Statements database. Add the public statement if necessary.
1) Relate the public statement to the factcheck in accordance with current procedure
1) Assign a ruling if not provided clearly by the fact checker (note: this is currently done in bulk for Politifact and individually for Factcheck.org.)
Special situations:
- a factcheck evaluating a statement by Candidate A about something Candidate B said should not be related to Candidate B but should be related to Candidate A
====Management of Factchecking Updates====
**Key Objectives:**
Get our data as up-to-date as possible
Work with IT on the following:
1) improve process so that updates of this content can be done more frequently
1) address "Known Issues" as needed
1) add way to input this data to Admin- this may include: adding word search capabilities for public statements; ability to associate factchecks with candidate speeches; separate section for factcheck entries including the ability to browse existing factchecks
1) integrate Factcheck.org's API
1) integrate data into our other web properties
1) integrate data into our public API
1) future development (See "Incorporating Fact-Checking Data" and "factchecking data to include" documents on the public drive->cross-department projects->possible future projects)
**Pace Estimates (for associating factchecks to public statements):**
beginners: recorded at 16-28 factcheck articles/hour using spreadsheet imports (sample size of 2 staff members; this includes approximately 50% of those articles marked "intentionally blank" because they were not factchecking federal active officials)
====API====
As of early 2014, the following content was fed through the public statements call of Version 2 of the API. It is Kristen's understanding that "ruling" was replaced with a flag of "questionable" (this should be verified):
Example snippet :
<<"factchecks": [
{
"factchecker": "PolitiFact.com",
"link":
"http://www.factcheck.org/2012/10/whoppers-of-2012-final-edition/",
"ruling": "entirely false"
},
{
"factchecker": "PolitiFact.com",
"link":
"http://www.factcheck.org/2012/10/dubious-denver-debate-declarations/",
"ruling": "mostly false + some context"
},<<
Our intention is to highlight the statements made by a candidate that were determined to be questionable or some degree of false. So, claims that were determined to be true would be excluded from the current display.
===Future Applications:===
Using our tagging system to identify additional speeches that a fact check could apply to.
CategoryResearch