Revision history for MatchIDs


Revision [54411]

Last edited on 2018-12-19 15:55:26 by Rachael
Additions:
--[[https://exceljet.net/index-and-match]] Gives more examples of using the INDEX and MATCH functions]]
-- Occasionally numbers will display as text with an apostrophe before the number ('1234). If this happens, refer to [[https://forum.openoffice.org/en/forum/viewtopic.php?f=9&t=49046]]>>
Deletions:
--[[https://exceljet.net/index-and-match]] Gives more examples of using the INDEX and MATCH functions]]>>


Revision [54410]

Edited on 2018-12-19 15:54:33 by Rachael
Additions:
>>**Helpful Tools**
--[[https://tabula.technology/ Tabula]] converts PDFs to CSVs
--[[https://exceljet.net/index-and-match]] Gives more examples of using the INDEX and MATCH functions]]>>
%%=IF(COUNTIF(sheet.key.first:sheet.key.last, sheet.key.first)>1 "Dupe", " ")
EX: =IF(COUNTIF($A$2:$A$8, $A2)>1, "DUPE", " ")%%
Deletions:
%%=IF(MATCH(scorecard.key.x,(scorecard.key.x+1:scorecard.key.last),0), "Dupe", " ")
EX: =IF(MATCH(C2,(C3:$C$269),0), "Dupe", " ")%%


Revision [17662]

Edited on 2016-05-06 15:50:00 by Rachael
Additions:
~-**Index-match formula**
~-Check for **NA candidates** (candidates without a candidate_id). The following formula will return the number of candidate_id's returned as 'NA'
~-Check for **Duplicate candidate_id's** (returns the same candidate_id for multiple people):
Deletions:
~-Index-match formula
~-Check for NA candidates (candidates without a candidate_id). The following formula will return the number of candidate_id's returned as 'NA'
~-Check for Duplicate candidate_id's (returns the same candidate_id for multiple people):


Revision [17661]

Edited on 2016-05-06 15:41:18 by Rachael
Additions:

%%=INDEX(database.candidate_id.first:database.candidate_id.last,MATCH(scorecard.key.x, database.key.first:database.key.last, 0))
EX: =INDEX(index.$A$2:index.$A$269,MATCH(scorecard.B2,index.$C$2:index.$C$269, 0)) %%

%%=ROWS (range) – COUNTIF (range “>0”)
EX: =ROWS(B2:B536) – COUNTIF(B2:B536, “>0”)%%

%%=IF(MATCH(scorecard.key.x,(scorecard.key.x+1:scorecard.key.last),0), "Dupe", " ")
EX: =IF(MATCH(C2,(C3:$C$269),0), "Dupe", " ")%%
Deletions:
~-=INDEX(database.candidate_id.first:database.candidate_id.last,MATCH(scorecard.key.x, database.key.first:database.key.last, 0))
~-EX: =INDEX(index.$A$2:index.$A$269,MATCH(scorecard.B2,index.$C$2:index.$C$269, 0))
~-=ROWS (range) – COUNTIF (range “>0”)
~-EX: =ROWS(B2:B536) – COUNTIF(B2:B536, “>0”)
~-=IF(MATCH(scorecard.key.x,(scorecard.key.x+1:scorecard.key.last),0), "Dupe", " ")
~-EX: =IF(MATCH(C2,(C3:$C$269),0), "Dupe", " ")


Revision [17660]

Edited on 2016-05-06 15:34:13 by Rachael
Additions:
4. Spotcheck the formula results to verify that you did it properly. Focus on politicians with common names, because the computer may have mixed up their candidate IDs. Also look at politicians with any non-standard characters in their names like a tildes, accent or cedilla.
Deletions:
4. Spotcheck the formula results to verify that you did it properly. Focus on politicians with common names, because the computer may have mixed up their candidate IDs. Also look at politicians with any non-standard characters in their names like a tilde or cedilla.


Revision [17659]

Edited on 2016-05-06 15:32:19 by Rachael
Additions:
4. Spotcheck the formula results to verify that you did it properly. Focus on politicians with common names, because the computer may have mixed up their candidate IDs. Also look at politicians with any non-standard characters in their names like a tilde or cedilla.
Deletions:
4. Spotcheck the formula results to verify that you did it properly. Focus on politicians with common names, because the computer may have mixed up their candidate IDs. Also look at politicians with any non-standard characters in their names like a tilda or cedilla.


Revision [17658]

Edited on 2016-05-06 14:58:48 by Rachael
Additions:
~-Use SQL to pull candidate results for the state and years you are looking for
~-Transfer information from SQL to the worksheet
~-Create a new sheet within the worksheet, label 'index'
~-Go to 'View' in the toolbar and select 'Data Sources' (or F4)
~-Click and drag the appropriate query into the first cell
~-Insert a new column A, label 'candidate_id'
~-Match a value from the scorecard with a value from the index
~-Index-match formula
~-=INDEX(database.candidate_id.first:database.candidate_id.last,MATCH(scorecard.key.x, database.key.first:database.key.last, 0))
~-EX: =INDEX(index.$A$2:index.$A$269,MATCH(scorecard.B2,index.$C$2:index.$C$269, 0))
~-This match used two sheets - 'scorecard' and 'index'. 'scorecard' contains the raw scorecard and 'index' contains data from a query of relevant candidates from the database.
~-Drag the formula down the list of names to auto-fill the rest of the id's
~-As keys, start with lastname first. If additional information is required for an efficient match, concatenate several keys together
~-'=x&y'
~-Combine lastname and party into 'ObamaDemocratic' with '=B2&E2'
~-Index-matching will not always return fully matched scorecards. In this case you will need to watch out for the following errors:
~-Check for NA candidates (candidates without a candidate_id). The following formula will return the number of candidate_id's returned as 'NA'
~-=ROWS (range) – COUNTIF (range “>0”)
~-EX: =ROWS(B2:B536) – COUNTIF(B2:B536, “>0”)
~-Check for Duplicate candidate_id's (returns the same candidate_id for multiple people):
~-Create a new column and add the following formula:
~-=IF(MATCH(scorecard.key.x,(scorecard.key.x+1:scorecard.key.last),0), "Dupe", " ")
~-EX: =IF(MATCH(C2,(C3:$C$269),0), "Dupe", " ")
~-Drag down to auto-fill the rest of the rows.
~-If the formula returns a "Dupe", enter the correct candidate ID for all matching keys (i.e., all duplicated candidate_ids) manually by searching your index or admin for the persons candidate_id. If it returns "#N/A", ignore the match. Use this formula after index-matching candidate_ids.
~-Correct all errors found
~-Duplicate candidates
~-N/A values
~-People with the same last name
~-etc...
Deletions:
Use SQL to pull candidate results for the state and years you are looking for
Transfer information from SQL to the worksheet
Create a new sheet within the worksheet, label 'index'
Go to 'View' in the toolbar and select 'Data Sources' (or F4)
Click and drag the appropriate query into the first cell
Insert a new column A, label 'candidate_id'
Match a value from the scorecard with a value from the index
Index-match formula
=INDEX(database.candidate_id.first:database.candidate_id.last,MATCH(scorecard.key.x, database.key.first:database.key.last, 0))
EX: =INDEX(index.$A$2:index.$A$269,MATCH(scorecard.B2,index.$C$2:index.$C$269, 0))
This match used two sheets - 'scorecard' and 'index'. 'scorecard' contains the raw scorecard and 'index' contains data from a query of relevant candidates from the database.
Drag the formula down the list of names to auto-fill the rest of the id's
As keys, start with lastname first. If additional information is required for an efficient match, concatenate several keys together
'=x&y'
Combine lastname and party into 'ObamaDemocratic' with '=B2&E2'
Index-matching will not always return fully matched scorecards. In this case you will need to watch out for the following errors:
Check for NA candidates (candidates without a candidate_id). The following formula will return the number of candidate_id's returned as 'NA'
=ROWS (range) – COUNTIF (range “>0”)
EX: =ROWS(B2:B536) – COUNTIF(B2:B536, “>0”)
Check for Duplicate candidate_id's (returns the same candidate_id for multiple people):
Create a new column and add the following formula:
=IF(MATCH(scorecard.key.x,(scorecard.key.x+1:scorecard.key.last),0), "Dupe", " ")
EX: =IF(MATCH(C2,(C3:$C$269),0), "Dupe", " ")
Drag down to auto-fill the rest of the rows.
If the formula returns a "Dupe", enter the correct candidate ID for all matching keys (i.e., all duplicated candidate_ids) manually by searching your index or admin for the persons candidate_id. If it returns "#N/A", ignore the match. Use this formula after index-matching candidate_ids.
Correct all errors found
Duplicate candidates
N/A values
People with the same last name
etc...


Revision [17657]

Edited on 2016-05-06 14:54:51 by Rachael
Additions:
Combine lastname and party into 'ObamaDemocratic' with '=B2&E2'
6. Save .csv and upload via the [[http://pyadmin.votesmart.org/pipelines/sig/ | harvester ]]
Deletions:
Combine lastname and party into “ObamaDemocratic” with '=B2&E2'
6. Save .csv and upload via the harvester


Revision [17656]

Edited on 2016-05-06 14:05:55 by Rachael [index-match update 5/5/16]
Additions:
1. Create an index
Use SQL to pull candidate results for the state and years you are looking for
Transfer information from SQL to the worksheet
Create a new sheet within the worksheet, label 'index'
Go to 'View' in the toolbar and select 'Data Sources' (or F4)
Click and drag the appropriate query into the first cell
The results of your query should now be in the spreadsheet.
2. Find appropriate candidate_id's from index to scorecard
Insert a new column A, label 'candidate_id'
Match a value from the scorecard with a value from the index
Index-match formula
=INDEX(database.candidate_id.first:database.candidate_id.last,MATCH(scorecard.key.x, database.key.first:database.key.last, 0))
EX: =INDEX(index.$A$2:index.$A$269,MATCH(scorecard.B2,index.$C$2:index.$C$269, 0))
This match used two sheets - 'scorecard' and 'index'. 'scorecard' contains the raw scorecard and 'index' contains data from a query of relevant candidates from the database.
Drag the formula down the list of names to auto-fill the rest of the id's
As keys, start with lastname first. If additional information is required for an efficient match, concatenate several keys together
'=x&y'
Combine lastname and party into “ObamaDemocratic” with '=B2&E2'
3. Errors
Index-matching will not always return fully matched scorecards. In this case you will need to watch out for the following errors:
Check for NA candidates (candidates without a candidate_id). The following formula will return the number of candidate_id's returned as 'NA'
=ROWS (range) – COUNTIF (range “>0”)
EX: =ROWS(B2:B536) – COUNTIF(B2:B536, “>0”)
Check for Duplicate candidate_id's (returns the same candidate_id for multiple people):
Create a new column and add the following formula:
=IF(MATCH(scorecard.key.x,(scorecard.key.x+1:scorecard.key.last),0), "Dupe", " ")
EX: =IF(MATCH(C2,(C3:$C$269),0), "Dupe", " ")
Drag down to auto-fill the rest of the rows.
If the formula returns a "Dupe", enter the correct candidate ID for all matching keys (i.e., all duplicated candidate_ids) manually by searching your index or admin for the persons candidate_id. If it returns "#N/A", ignore the match. Use this formula after index-matching candidate_ids.
Correct all errors found
Duplicate candidates
N/A values
People with the same last name
etc...
4. Spotcheck the formula results to verify that you did it properly. Focus on politicians with common names, because the computer may have mixed up their candidate IDs. Also look at politicians with any non-standard characters in their names like a tilda or cedilla.
5. Transfer all relevant columns to a .csv file using “Paste Only - Number” . Remove all formulas
6. Save .csv and upload via the harvester
7. Document in 'harvested scorecards' on the Google Drive.
Deletions:
~1) Create key on both the dataset to be imported and dataset with our candidate_ids using concatenation and other necessary text functions. Make sure these keys match exactly. That is, do not use the first three letters of someone's last name in one sheet and the first four letters in the other sheet. You may choose to save keys in your candidate database to speed up the process in the future.
~1) Use the following formula in the first cell of the first empty column on your incoming dataset, replacing the bracketed areas with the appropriate values: =INDEX([range of candidate IDs in candidate database],MATCH([cell which contains the key on the scorecard], [range which contains the keys in candidate database],0))
~1) Autofill the formula for all your candidates on the incoming dataset.
~1) Spotcheck the formula results to verify that you did it properly. Focus on politicians with common names, because the computer may have mixed up their candidate IDs. Also look at politicians with any non-standard characters in their names like a tilda or cedilla.
~1) Copy the values of the column with candidate IDs on the incoming dataset and paste special the "values" of that column in a new column. After verifying a successful paste, delete the column with the formula.
~1) For those where no match could be found, find their ID's manually in admin.
~1) Save as a CSV and submit to mantis.


Revision [15158]

Edited on 2015-04-28 15:42:23 by MichaelM [index-match update 5/5/16]
Additions:
After properly formatting the data for import:
~1) Create key on both the dataset to be imported and dataset with our candidate_ids using concatenation and other necessary text functions. Make sure these keys match exactly. That is, do not use the first three letters of someone's last name in one sheet and the first four letters in the other sheet. You may choose to save keys in your candidate database to speed up the process in the future.
~1) Use the following formula in the first cell of the first empty column on your incoming dataset, replacing the bracketed areas with the appropriate values: =INDEX([range of candidate IDs in candidate database],MATCH([cell which contains the key on the scorecard], [range which contains the keys in candidate database],0))
~1) Autofill the formula for all your candidates on the incoming dataset.
~1) Copy the values of the column with candidate IDs on the incoming dataset and paste special the "values" of that column in a new column. After verifying a successful paste, delete the column with the formula.
Deletions:
After properly formatting the scorecard for import:
~1) Create key on both the scorecard and candidate database using concatenation and other necessary text functions. Make sure these keys match exactly. That is, do not use the first three letters of someone's last name in one sheet and the first four letters in the other sheet. You may choose to save keys in your candidate database to speed up the process in the future.
~1) Use the following formula in the first cell of the first empty column on your scorecard, replacing the bracketed areas with the appropriate values: =INDEX([range of candidate IDs in candidate database],MATCH([cell which contains the key on the scorecard], [range which contains the keys in candidate database],0))
~1) Autofill the formula for all your candidates on the scorecard.
~1) Copy the values of the column with candidate IDs on your scorecard and paste special the "values" of that column in a new column. After verifying a successful paste, delete the column with the formula.


Revision [14887]

Edited on 2015-04-27 13:45:46 by MichaelM [expanded instructions to include Camp. Fin.]
Additions:
===Matching IDs===
Creating a common key relates multiple datasets together and enables you to combine or compare the datasets. This is useful for Scorecards and Campaign Finance updates, among other things.
~1) Create key on both the scorecard and candidate database using concatenation and other necessary text functions. Make sure these keys match exactly. That is, do not use the first three letters of someone's last name in one sheet and the first four letters in the other sheet. You may choose to save keys in your candidate database to speed up the process in the future.
~1) Use the following formula in the first cell of the first empty column on your scorecard, replacing the bracketed areas with the appropriate values: =INDEX([range of candidate IDs in candidate database],MATCH([cell which contains the key on the scorecard], [range which contains the keys in candidate database],0))
~1) Autofill the formula for all your candidates on the scorecard.
~1) Spotcheck the formula results to verify that you did it properly. Focus on politicians with common names, because the computer may have mixed up their candidate IDs. Also look at politicians with any non-standard characters in their names like a tilda or cedilla.
~1) Copy the values of the column with candidate IDs on your scorecard and paste special the "values" of that column in a new column. After verifying a successful paste, delete the column with the formula.
~1) For those where no match could be found, find their ID's manually in admin.
~1) Save as a CSV and submit to mantis.
Deletions:
Matching IDs
Creating a common key relates multiple datasets together. This enables one to combine or compare the datasets
1) create key on both the scorecard and candidate database using concatenation and other necessary text functions. Make sure these keys match exactly. Save keys in your candidate database to speed up the process in the future.
1) use the following formula in the first cell of the first empty column on your scorecard, replacing the bracketed areas with the appropriate values =INDEX([range of candidate IDs in candidate database],MATCH([cell which contains the key on the scorecard], [range which contains the keys in candidate database],0))
1) autofill the formula for all your candidates on the scorecard
1) spotcheck the formula results to verify that you did it properly
1) copy the values of the column with candidate IDs on your scorecard and paste special the "values" of that column in a new column. After verifying a successful paste, delete the column with the formula
1) for those where no match could be found, find their ID's manually in admin
1) save as a CSV and submit to mantis


Revision [14570]

Edited on 2015-03-25 16:57:27 by KristenVicedomini [expanded instructions to include Camp. Fin.]
Additions:
Matching IDs
Creating a common key relates multiple datasets together. This enables one to combine or compare the datasets


Revision [8198]

Edited on 2013-03-24 14:40:16 by KristenVicedomini [expanded instructions to include Camp. Fin.]
Additions:
----
[[CategoryTechnicalSkills]]


Revision [4884]

Edited on 2011-03-17 12:13:11 by KristenVicedomini [expanded instructions to include Camp. Fin.]
Additions:
1) create key on both the scorecard and candidate database using concatenation and other necessary text functions. Make sure these keys match exactly. Save keys in your candidate database to speed up the process in the future.
1) use the following formula in the first cell of the first empty column on your scorecard, replacing the bracketed areas with the appropriate values =INDEX([range of candidate IDs in candidate database],MATCH([cell which contains the key on the scorecard], [range which contains the keys in candidate database],0))
1) spotcheck the formula results to verify that you did it properly
1) copy the values of the column with candidate IDs on your scorecard and paste special the "values" of that column in a new column. After verifying a successful paste, delete the column with the formula
1) save as a CSV and submit to mantis
Deletions:
1) create key on both the scorecard and candidate database using concatenation and other methods. Make sure these keys match exactly. Save keys in your candidate database to speed up the process in the future.
1) use the following formula in the first cell of the first empty column on your scorecard, replacing the bracketed areas with the appropriate values =INDEX([range of candidate IDs in candidate database],MATCH([cell which contains the key on the scorecard], [column which contains the key in candidate database],0))
1) copy the values of the column with candidate IDs and paste special the "values" of that column in a new column. After verifying a successful paste, delete the column with the formula


Revision [4883]

Edited on 2011-03-17 12:11:04 by KristenVicedomini [expanded instructions to include Camp. Fin.]
Additions:
1) create key on both the scorecard and candidate database using concatenation and other methods. Make sure these keys match exactly. Save keys in your candidate database to speed up the process in the future.
1) use the following formula in the first cell of the first empty column on your scorecard, replacing the bracketed areas with the appropriate values =INDEX([range of candidate IDs in candidate database],MATCH([cell which contains the key on the scorecard], [column which contains the key in candidate database],0))
1) copy the values of the column with candidate IDs and paste special the "values" of that column in a new column. After verifying a successful paste, delete the column with the formula
Deletions:
1) create key on both the scorecard and candidate database using concatenation and other methods
1) use the following formula in the first cell of the first empty column on your scorecard, replacing the bracketed areas with the appropriate values =INDEX([[range of candidate IDs in candidate database]],MATCH([[cell which contains the key on the scorecard]], [[column which contains the key in candidate database]],0))


Revision [4882]

The oldest known version of this page was created on 2011-03-17 12:08:30 by KristenVicedomini [expanded instructions to include Camp. Fin.]
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki