Bulk import is a method of entry where whole scorecards can be uploaded to the database via a spreadsheet, thus bypassing one-by-one entry through admin. This method allows us to post more ratings to our site in a shorter amount of time.

The Harvester module on pypvsadmin is what allows us to upload ratings directly into the database without tying up IT resources. In order to use the Harvester, however, the data must be precisely formatted.

There are two critical steps in harvesting ratings. First, a comma-separated values file (or CSV – think basic spreadsheet) must be created with the data to be imported into admin. Second, the uploaded scorecard must have additional information added to it (such as name, cats, tags, etc.) through admin.
Tabula is a helpful resource in converting PDFs to spreadsheets. Follow the website and README instructions.


Whether scraping or index-matching, the following protocols must be followed in order for the Harvester to work.

Formatting

The Harvester will only work if working CSV is properly formatted. In order to upload the CSV needs 8 columns:
candidate_id: The first column should be for candidates_ids. The candidate or official associated with an id should match the original scorecard. The Harvester will not work if there are duplicate candidate_id. link to wherever duplicates are discussed.
sig_rating: This column is used to enter the original rating from the SIG. sig_rating does not need to be modified for our_rating to appear correctly. Enter scores as is or as close to the original as possible.
our_rating: Use this column to enter the translation of the sig_rating. If a formula was used to translate, be sure to have formula converted to text or numeric.
span: Enter the year of the scorecard. There cannot be blank cells in this or any of the following fields.
sig_id: In order for the Harvester to know what SIG to associate with the rating, the sig_id must be included. No blanks cells.
usesigrating: Usesigrating is how the Harvester knows whether to display sig_rating or our_rating on the website. usesigrating = 't' displays the sig_rating. usesigrating = 'f' displays our_rating. In almost all cases usesigrating should be set as 'f'.
ratingsession This column is to determine if the session being rated is the first, second, or complete session. ratingformat_id: Ratingformat_id tells us the format of sig_rating. Formats include numeric, string, open, grade scale. This affects how a scorecard is displayed on our site.
Now, the scorecard is ready to be harvested. Save this document in the 'Ratings' folder for the SIG as a .csv file using the following naming format:
An example harvest file can be found in the drive in the national group folder "National Federation of Independent Business" '2013-2014_National_Federation_of_Independent_Business_Scorecard.csv'

See: FormattingDataInExcel

Uploading a scorecard using the Harvester

  1. Go to pyadmin Ratings Harvester. Follow the instructions to select and upload your CSV file(s).
  2. Refresh the page. If the most recent job is marked 'COMPLETE' under 'Completed Harvests', then the scorecard has been imported into the database. If a job shows an error, read the error and fix the CSV. Frequently errors are due to typos in the header or duplicate candidate_ids.
    • Check your work! Be sure that there are no duplicates, all 8 fields are present and correctly named, all fields have the correct data type, etc.
    • If you still cannot locate the problem, ask your supervisor
  3. Once successfully uploaded, go to the CEC tracking sheet on the Google Drive and fill out the corresponding entry cells. Make sure to denote the type of entry as 'harvest' or 'Scrape'.

Webchecking a Bulk Import Scorecard

Webchecking a scorecard that has been imported into the database via the harvester requires a few extra steps than a checking a scorecard that was manually added. This is because the harvester only needs the data points listed above to upload any files. The rest of the information associated with a rating needs to be completed using admin. This should include: If the harvest is from a scrape, candidates that appear in the '...Errors.csv' file need to be manually added to admin. Refer to the original scorecard if necessary. Once all the appropriate information is added to the rating, start a webcheck as usual. Make sure to look out for any patterns of errors.
  1. Release the scorecard to the Internal web.
  2. Check the ratings on skittles against the primary scorecard.
  3. If there are errors:
        • If they fit a common pattern, make a note on the CEC tracking sheet and tag the person who entered the scorecard.
        • This will help us to improve our processes. Examples of common errors that can be improved are mismatches based on nicknames, hyphenated last names, etc.
        • Correct the errors on admin.
  4. When complete, release the scorecard to live web.
        • Quickly scan the scorecard on the live web to be sure it looks accurate.

Keep Track of your Progress
Update the Google Drive tracking sheets accordingly.

Attachments
File Last modified Size
2014_NA_ACU_Scorecard_conversion.ods 2015-09-08 15:18 105Kb
There is one comment on this page. [Display comment]
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki