Introduction



Campaign finance data is considered one of the major six categories of data that make up the Voter's Self Defense System. However we do not collect any campaign finance data ourselves. Instead, we currently rely on two other organizations for this data: The Center for Responsive Politics for federal candidates, and The National Institute for State Money in Politics, for state, and select local candidates.

Federal Campaign Finance Data

The data for federal officials and candidates comes from OpenSecrets, commonly referred to as CRP. CRP's data is displayed for any current member of the US House and Senate, as well as the President and Vice President. In election years, this data will also display for any candidate running for the offices listed above. We list campaign finance data on a dedicated section of a candidate's profile on our website. See Senator Dick Durbin as an example. We also display some limited campaign finance data on VoteEasy.

For Congressional candidate's finance page, depending on the availability of the data, will be contained in four main groups: "Summary", "Top Contributors", "Top Industries", and "Top Sectors". Summary data is the overview of what a candidate has raised and spent within a particular cycle. Each of these groups is populated through an API call to return results for each group (These calls are 'candSummary', 'candContrib', 'candIndustry', and 'candSector' respectively). Contributors displays the top individuals/groups giving to a candidate. Note that only candidate committees are included in the data we pull from CRP, not necessarily individual contributors (so our contributors display may differ from that on CRP's site). Industry and Sector data are coded by researchers at our partners. Industry data covers the business affiliations of contributors, while Sector data is more detailed and varied.

For Presidental candidate's finance pages, while the display on our website is seemingly identical with the four groups specified above, the process of collecting this information differs from Congressional Candidates. For Presidential Candidates, the API methods are uniquely different, apart from the 'candSummary' ('prescandContrib' for Contribution data, 'prescandIndustry' for Industry data, and 'prescandSector' for Sector data). For every Presidential Cycle, CRP must go through a process of building out their candidate profiles for Presidential candidates, as well as updating their API calls which allows us to collect and display this data. Since Presidential candidates have unique calls, we cannot display any Presidential data until CRP have updated their API calls to pull data for the current election cycle (the election cycle is detailed by the year)

For the 2020 Presidential Election cycle, based on information given by our contacts at CRP (see CRP contacts) initial data is to be downloaded in March, with the building out of candidate profiles taking place in late April/May, including updates to their API (as detailed above) - at which point we will have access to Campaign Finance data for 2020 Presidential Candidates through the first quarter of 2019 (March 31st).

Through the use of the CRP API, we have access to the most up to date information included in their database. Each candidate and official has both a unique Project Vote Smart candidate ID and CRP candidate ID. By matching these two IDs for any given candidate, we are able to access this data directly from CRP's database to display on our website. The process of updating campaign finance data and matching IDs is something typically handled by Elections Research department. This involves signing up for a bulk data account to be able to download a full list of CRP candidate_ids that we use to match with our candidate_ids. Generally, this data can be downloaded directly from the API Documentation and Reference section from their website. Their candidate list should be updated quarterly. Occasionally, we have to contact CRP for their most up to date list of candidates and IDs. This can be done by contacting the IT director at CRP and asking for an updated listing of candidate IDs.

It is worth noting that we have an ongoing relationship with CRP and their staff.

State and Local Campaign Finance Data

Our state level campaign finance data comes from the National Institute on Money in State Politics, or "NIMSP" for short. The institute essentially replicates what CRP does for all 50 states, compiling campaign finance data for candidates and officials running for governor, elected statewide offices (attorney general, treasurer, secretary of state, etc.), state house/assembly, state senate, the highest state court, and the state appellate court. As of 2015, NIMSP has begun select local coverage. This information is displayed on our website for state officials in the same manner it is for federal officials. However, Local campaign finance is not currently displaying on votesmart.org (Mantis #7293) even though we have ID matches for local candidates.

Again, through the use of the Institute's API, we have access to the most up to date information included in their database. By matching the institute's candidate IDs with the Vote Smart candidate ID's, the data displays directly on our website. This process is done in a similar way through running the script to create candidate ID matches. These updates should also be done quarterly in conjunction with the CRP update.

Alternatively, you can view this URL - https://www.followthemoney.org/show-me?y=2020&f-core=1#[%7B1%7Cgro=c-t-id - for the most recent listing of candidates so that we can access that data whenever we want. It will have to be updated as election cycles progress, which makes an API pull using R preferable in one respect (see below).

It is worth noting that we have an ongoing relationship with NIMSP. Several of their current staff are former interns or staff from PVS.

General Guidelines to Complete an Update

There are two possible procedures to update this data: manual and bulk imports. Both involve matching the campaign finance organization's candidate IDs to Vote Smart candidate IDs. In both cases, the first step is to get a list of our politicians who are missing campaign finance data, and the second is to acquire the partner organization's candidate IDs.

Query to Find Missing Campaign Finance Candidate IDs
This query pulls candidates without a finsource_candidate_id who are up for election in the current year. It can be expanded to pull more candidates without CRP/NIMSP IDs.
SELECT c.candidate_id, c.lastname, c.firstname, o.name AS office, oc.state_id, dn.name as district, pvs_election_party(oc.election_candidate_id) AS parties
FROM candidate c 
JOIN election_candidate oc USING (candidate_id) 
JOIN office o USING (office_id) 
JOIN officetype ot USING (officetype_id) 
JOIN election e USING (election_id) 
LEFT JOIN districtname dn USING (districtname_id)
LEFT JOIN finsource_candidate fc 
	ON fc.candidate_id = c.candidate_id AND CASE WHEN ot.officelevel_id = 'F' THEN fc.finsource_id = 1 WHEN ot.officelevel_id = 'S' THEN fc.finsource_id = 3 END
WHERE fc.finsource_candidate_id IS NULL
AND e.electionyear = extract(year from now())
ORDER BY o.office_id, oc.state_id, c.lastname


Campaign Finance Update Methods


Bulk updates with IT - using Python:

This process uses a collection of scripts coded in Python programming language to match the VS candidate_id in our database with the candidate_id equivalent in our partner’s database. Using the scripts would reduce errors from manual entry and would also provide more consistency and ease for bulk candidate matches.

All relevant files are located in "/research/Active/Research Programs/Campaign Finance/Campaign_Finance/", organized by year and quarter. Because the process is constantly improving, a more updated version of the scripts can be found on GitHub.

Github link: https://github.com/votesmartsig/campaignfinance

Below outlines the process of the scripts:
  1. The scripts first connect to our database and then query out relevant candidates, candidates that have no link to our partner’s unique ID. Refer to the query above.
  2. Currently, we have two different types of input sources from both CRP and NIMSP. The scripts read CRP data from a spreadsheet, whereas it reads NIMSP data from the API. This CRP spreadsheet should be sourced ahead of time from CRP contacts to get latest updates on candidate entries. This sourcing should be done by the Elections Research Director. Refer to the http://wiki.votesmart.org/OtherContacts wiki page for contact information.
  3. The scripts will perform a data clean by putting the data into their appropriate columns. The purpose of this is so that our partner’s data will be modeled as closely as possible to our data.
  4. By comparing both our data and our partner’s modeled data, the scripts would be able to identify matches based on several determining factors. These factors are candidate names and state, and a candidate must match both factors to be matched. Should these factors not match identically, then other factors to be considered such as offices and districts, to try to narrow down to a single match result. While these factors do not determine a match, it is highly important for it to detect false positives and mismatches.
  5. One last measure that the scripts take is to verify if those matched candidates have already been entered in the database or if there are duplicates within the sheet itself.

There are three output files:
i) Extract file - {year}_{month}_{source}_Extract.json
This file contain the original data extracted from the API (for NIMSP script) - for CRP, the spreadsheet will be saved with similar file structure: {year}_{month}_{source}_raw.xls
ii) Modeled file - {year}_{month}_{source}_Modeled.ods
This file contains data modeled from the original data source from our partners to match the structure of the VS database.
iii) Matched file - {year}_{month}_{source}_Matched.ods
This file contains the matched data between the two datasets - this is where ID matches can be found, and reviewed before providing to IT for bulk import.

How to check files:

The file that you will be reviewing is the Matched file. There are several different columns in the spreadsheet that are more efficiently reviewed if reviewed in the order that is described below:
  1. The first thing to check is to see if there are any duplicates within the sheet. These duplicates are typically the duplicate of our partner unique IDs shared among two candidate_id. This is shown in the ‘sheet_check’ column, it will show you what and which candidate has been duplicated.
  2. Now you need to verify whether or not the program has correctly matched the candidates. The program does not perfectly match the data simply because we are matching two unequal lengths of data. To do so, you have to filter for candidates that have ‘REVIEW:…’ under the ‘matchstatus’ column. It will tell you which column to review. Cross check the ‘Modeled’ file by looking up for that unique ID.
  3. The last thing to do is to filter out candidates that already have a unique ID. For CRP, there are two columns to check due to the FEC ID. Filter out blanks and ‘True’ to only ‘False’ for candidates that do not yet have a unique ID.
  4. Create a new file for candidates that do not have the unique ID, as this will be the final file provided to ID for import - these import files will need to be saved as a csv file, with 2 columns included: the VS candidate_id for each new match in column A, and the matched partner ID in column B.
Repeat step 2 to 3 if there are more than one type of unique ID (CRP ID vs. FEC ID).

Here are some notes of particular Review types:
Review District: This means that the candidate in our database is not running in the same district listed by our partners. There is a multitude of reasons why a candidate could get flagged with this error. One possibility could be that Vote Smart describes the districts differently from the two organizations. For example, Vote Smart lists Montana US House District as At-Large, but CRP lists the District as MT01. It could be that the candidate has decided to run in a different district compared to previous years. Or it could be an error on part of our researchers, who accidentally marked the candidate as running in the wrong district. The researcher should investigate each instance to determine why a candidate has this flag.
Review Office: This means that the candidate in our database is not running for the same office as listed by our partners. There is again a multitude of reasons why a candidate could get flagged with this error. One possibility could be that the candidate is running for an office they have never run for before. (Ex. A candidate ran 5 times for a US House Seat, is now for the first time running for a US Senate Seat). Another possibility is that the office title is different from what is listed by our partners (Ex. State Board of Education position in NIMP could be called Education Commissioner). A researcher could have also made an error when selecting the office the candidate is running for. The researcher should investigate each instance to determine why a candidate has this flag.
Status Check: Status Check usually is a result of the query highlighting possible dupes. Make sure to check each result to determine if the query was correct in flagging the candidates. If a candidate needs to have their campaign finance ID manually entered, don’t forget to delete that candidate from the sheet.


Important: Upload each CSV file from our partners (CRP/NIMSP) in separate tickets, as they both receive differing finsource_id's in the upload process.

For NIMSP data, we use V2 ID's to add to our database (finsource_id = 4). For CRP data, we use (finsource_id = 1), and for FEC data, we use (finsource_id = 2)

Once IT notes that the import has been completed, check several of the imported candidate_id's on votesmart.org for quality.

Campaign Finance Script Files
Attachments
File Last modified Size
2020_MAR_CRP_Matched.ods 2020-04-07 11:06 77Kb
2020_MAR_CRP_Modeled.ods 2020-04-07 11:08 96Kb


Bulk updates with IT - using Excel/Libre Office:
  1. For a CRP update, access their listing of candidate ID's by going to Open Secrets. If you do not have one already, sign up for an OpenSecrets account.
  2. Once you log in, go to Bulk Data. Scroll down towards the bottom of the page and look for the heading "Reference Data." Listed below is a link to "CRP IDs.xls". Download this file.
  3. If you notice that the file hasn't been updated in a while, you can email the IT director at CRP and ask for an updated CRP ID document.
  4. For an NIMSP update, use this link to download the most up to date version of their IDs: https://www.followthemoney.org/show-me?y=2018&f-core=1#[%7B1%7Cgro=c-t-id
  5. Use the CONCATENATE and INDEX MATCH functions to match PVS candidate IDs with CRP/NIMSP IDs.
  6. You may have to perform this process several times to get all possible matches. It is recommended that you create and use keys that will be unique to each politician to avoid mix ups and matching to the wrong individuals. BE VERY CAREFUL: in the past, using firstname + lastname + state_id has led to false positives (there are two Robert Bell's in VA). For this reason, the R script is preferable because it checks for double matches. Follow this advice strictly for your first attempt with the Index Match functions. You'll make "X" number of matches. Pull those out, create new keys, and index match them. You'll get more matches. Repeat this process until the matches are no longer accurate. At this point, you will have to wait for the campaign finance organizations to catch up to our candidate coverage.

Other Useful Excel Functions

Script Error Messages:
Following an R script run, a document of "...Errors.csv" will be created that outlines any unsuccessful matches that the script could not complete. These errors may be caused by a variety of reasons, which primarily fall under one of these error messages:
To resolve these errors, typically a research staff member with knowledge on campaign finance will work through this list manually and resolve the matches that the script could not successfully make. It is up to the researcher to determine the correct match and enter it into our database. These corrected errors can be entered manually to each candidate via the process below.

Manually in Admin:
  1. Pull up a candidate's biography page in admin and scroll all the way to the bottom; you should see "Candidate Finances".
  2. That link takes you to a page with "Campaign Finance Sources". This is where the campaign finance IDs are entered/edited.
  3. Select the correct "Finance Source" (CRP=federal, NIMSP=state). For NIMSP data, we use V2 ID's to add to our database (finsource_id = '4') V1 and V2 Follow-the-Money EIDs are linked, meaning that a V1 ID may work for our purposes though isn't preferable..
  4. Enter the CRP/NIMSP ID number from your prior research, and save.
  5. Check the politician's page on our website to verify that the data is displaying correctly.

It is useful to note that manual entry is more beneficial for small updates, as the manual process can be rather tedious and time consuming. For larger data updates it is best process to run the script and bulk import any candidate matches made through IT.

Campaign Finance Management/Display

One of the biggest challenges for our Campaign Finance coverage is that we do not look at the front end display as often as with other data types. Since the data is collected external to Vote Smart, Research Staff do not tend to view these pages often. For this reason, it is very important for the Research Director to periodically inspect federal, state, and local candidates' campaign finances for quality. It may be preferable, given greater resources, for the Director to train other members of Research in this processes.

There are numerous messages that can possibly be displayed in instances where there are issues with campaign finances on the website. These can be for a variety of reasons:
  1. CRP/NIMSP do not have updated information on the candidate’s finances for that particular election cycle.
  2. Alternatively, we may not have a campaign finance match for a candidate. In this case we do not have any message displayed, but regular updates will help to remove these instances.

Vote-Easy uses a different method to calculate the total dollars a candidate has raised in a cycle than the website. The website pulls a candidate's total from the "total" field in the API call. Vote-Easy tabulates a candidate's total as "cash on hand" + "spent". These can lead to different figures on the website compared to Vote-Easy.

Inquiries on Campaign Finance

If for any reason a hotline caller, candidate, or citizen would like a more detailed explanation of specific campaign finance data, our policy is to refer them to the organization that actually handles the data. So, any questions about federal campaign finance data should be directed to CRP. Any questions about state/local campaign finance data should be directed to the National Institute on Money in State Politics.
There are 3 comments on this page. [Show comments]
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki