The Harvester module on pypvsadmin is what allows us to upload ratings directly into the database without tying up IT resources. In order to use the Harvester, however, the data must be precisely formatted.
There are two critical steps in harvesting ratings. First, a comma-separated values file (or CSV – think basic spreadsheet) must be created with the data to be imported into admin. Second, the uploaded scorecard must have additional information added to it (such as name, cats, tags, etc.) through admin.
Tabula is a helpful resource in converting PDFs to spreadsheets. Follow the website and README instructions.
Whether scraping or index-matching, the following protocols must be followed in order for the Harvester to work.
Formatting
The Harvester will only work if working CSV is properly formatted. In order to upload the CSV needs 8 columns:- candidate_id
- sig_rating
- our_rating
- span
- sig_id
- usesigrating
- ratingsession
- ratingformat_id
candidate_id: The first column should be for candidates_ids. The candidate or official associated with an id should match the original scorecard. The Harvester will not work if there are duplicate candidate_id. link to wherever duplicates are discussed.
sig_rating: This column is used to enter the original rating from the SIG. sig_rating does not need to be modified for our_rating to appear correctly. Enter scores as is or as close to the original as possible.
our_rating: Use this column to enter the translation of the sig_rating. If a formula was used to translate, be sure to have formula converted to text or numeric.
span: Enter the year of the scorecard. There cannot be blank cells in this or any of the following fields.
sig_id: In order for the Harvester to know what SIG to associate with the rating, the sig_id must be included. No blanks cells.
usesigrating: Usesigrating is how the Harvester knows whether to display sig_rating or our_rating on the website. usesigrating = 't' displays the sig_rating. usesigrating = 'f' displays our_rating. In almost all cases usesigrating should be set as 'f'.
ratingsession This column is to determine if the session being rated is the first, second, or complete session.
- First session = 1
- Second session = 2
- Full session = 3
- Unknown = -1
- Numeric = 1
- String = 3
- Open = -1
- Grade Scale = 2
Now, the scorecard is ready to be harvested. Save this document in the 'Ratings' folder for the SIG as a .csv file using the following naming format:
- span_sig.name_harvester.csv
An example harvest file can be found in the drive in the national group folder "National Federation of Independent Business" '2013-2014_National_Federation_of_Independent_Business_Scorecard.csv'
See: FormattingDataInExcel
Uploading a scorecard using the Harvester
- Go to pyadmin Ratings Harvester. Follow the instructions to select and upload your CSV file(s).
- Refresh the page. If the most recent job is marked 'COMPLETE' under 'Completed Harvests', then the scorecard has been imported into the database. If a job shows an error, read the error and fix the CSV. Frequently errors are due to typos in the header or duplicate candidate_ids.
- Check your work! Be sure that there are no duplicates, all 8 fields are present and correctly named, all fields have the correct data type, etc.
- If you still cannot locate the problem, ask your supervisor
- Once successfully uploaded, go to the CEC tracking sheet on the Google Drive and fill out the corresponding entry cells. Make sure to denote the type of entry as 'harvest' or 'Scrape'.
Webchecking a Bulk Import Scorecard
Webchecking a scorecard that has been imported into the database via the harvester requires a few extra steps than a checking a scorecard that was manually added. This is because the harvester only needs the data points listed above to upload any files. The rest of the information associated with a rating needs to be completed using admin. This should include:- Name
- Rating description
- Rating text
- Categories
- These should be copied from the SIG's cats unless the scorecard is on a separate/additional issue.
- The same rules apply to manual entry here as elsewhere: do not enter + or – grades, or negatives numeric scores, manually on admin.
- Release the scorecard to the Internal web.
- Check the ratings on skittles against the primary scorecard.
- If there are errors:
- If they fit a common pattern, make a note on the CEC tracking sheet and tag the person who entered the scorecard.
- This will help us to improve our processes. Examples of common errors that can be improved are mismatches based on nicknames, hyphenated last names, etc.
- Correct the errors on admin.
- When complete, release the scorecard to live web.
- Quickly scan the scorecard on the live web to be sure it looks accurate.
Keep Track of your Progress
Update the Google Drive tracking sheets accordingly.
| File | Last modified | Size |
|---|---|---|
| 2014_NA_ACU_Scorecard_conversion.ods | 2015-09-08 15:18 | 105Kb |