Matching IDs
Creating a common key relates multiple datasets together and enables you to combine or compare the datasets. This is useful for Scorecards and Campaign Finance updates, among other things.After properly formatting the data for import:
Helpful Tools
--Tabula converts PDFs to CSVs
--https://exceljet.net/index-and-match Gives more examples of using the INDEX and MATCH functions]]
-- Occasionally numbers will display as text with an apostrophe before the number ('1234). If this happens, refer to https://forum.openoffice.org/en/forum/viewtopic.php?f=9&t=49046
--Tabula converts PDFs to CSVs
--https://exceljet.net/index-and-match Gives more examples of using the INDEX and MATCH functions]]
-- Occasionally numbers will display as text with an apostrophe before the number ('1234). If this happens, refer to https://forum.openoffice.org/en/forum/viewtopic.php?f=9&t=49046
1. Create an index
- Use SQL to pull candidate results for the state and years you are looking for
- Transfer information from SQL to the worksheet
- Create a new sheet within the worksheet, label 'index'
- Go to 'View' in the toolbar and select 'Data Sources' (or F4)
- Click and drag the appropriate query into the first cell
The results of your query should now be in the spreadsheet.
- Insert a new column A, label 'candidate_id'
- Match a value from the scorecard with a value from the index
- Index-match formula
- This match used two sheets - 'scorecard' and 'index'. 'scorecard' contains the raw scorecard and 'index' contains data from a query of relevant candidates from the database.
- Drag the formula down the list of names to auto-fill the rest of the id's
- As keys, start with lastname first. If additional information is required for an efficient match, concatenate several keys together
- '=x&y'
- Combine lastname and party into 'ObamaDemocratic' with '=B2&E2'
=INDEX(database.candidate_id.first:database.candidate_id.last,MATCH(scorecard.key.x, database.key.first:database.key.last, 0)) EX: =INDEX(index.$A$2:index.$A$269,MATCH(scorecard.B2,index.$C$2:index.$C$269, 0))
- Index-matching will not always return fully matched scorecards. In this case you will need to watch out for the following errors:
- Check for NA candidates (candidates without a candidate_id). The following formula will return the number of candidate_id's returned as 'NA'
- Check for Duplicate candidate_id's (returns the same candidate_id for multiple people):
- Create a new column and add the following formula:
- Drag down to auto-fill the rest of the rows.
- If the formula returns a "Dupe", enter the correct candidate ID for all matching keys (i.e., all duplicated candidate_ids) manually by searching your index or admin for the persons candidate_id. If it returns "#N/A", ignore the match. Use this formula after index-matching candidate_ids.
- Correct all errors found
- Duplicate candidates
- N/A values
- People with the same last name
- etc...
=ROWS (range) – COUNTIF (range “>0”) EX: =ROWS(B2:B536) – COUNTIF(B2:B536, “>0”)
=IF(COUNTIF(sheet.key.first:sheet.key.last, sheet.key.first)>1 "Dupe", " ") EX: =IF(COUNTIF($A$2:$A$8, $A2)>1, "DUPE", " ")
5. Transfer all relevant columns to a .csv file using “Paste Only - Number” . Remove all formulas
6. Save .csv and upload via the harvester
7. Document in 'harvested scorecards' on the Google Drive.
CategoryTechnicalSkills