Documentation
Documenting your work is an important step as you go through the process of collection and entry. It enables staff and interns to keep track and be held accountable for the work that they have done. Work that had not been documented will easily fall through the cracks and it could cause work duplicates, mis-entry and other unforeseen errors, in other words, it is not going to be efficient. The other purpose of documentation is to create evidence, the presence of evidence supports our organization's integrity and also serve as a reference for future work and archival purposes.
The few aspects of documentation includes tracking, creating proper filenames and organizing files and folders. The different documenting aspects can be performed in any part of the research process. For example, one could organize and rename the files after entering the data onto admin or one could update the tracking document at the end of the day instead of doing it on the go, it all depends whichever way feels most comfortable. Though flexible, it is time sensitive and it is highly recommended at least that one documented their work at the end of the day or at the end of one's shift
Tracking
Tracking in this sense not only to track our work but also serves to evaluate the progress of SIGS as a whole. The first document is called the 'Sweep Sheet' which is a reusable and perpetual document where updates overwrite the older ones but only after the new update is issued. The second document is called the 'CEC Sheet' where 'CEC' stands for "Collection, Entry and Checks", this is a fixed document where it is renewed every year. Updates to the 'CEC sheet' are static and will append new updates instead of overwriting the old ones.
Sweep Sheet (Spreadsheet)
The purpose of this tracking document is to track the status and most recent updates to every SIG in our database. It contains the necessary columns that is vital to the process of collection.Below is the skeleton for a 'Sweep Sheet':
| column | description |
|---|---|
| sig_id | Refers to the unique id of the SIG, important to have in case of discrepancy in name |
| sig_name | Name of the SIG as on Admin |
| url | Web address of the SIG's website, although it could sometimes be the SIG's social media site if web address not given |
| url_status | Show the availability of the SIG's web address, can be 'working', 'broken' or 'redirect' |
| release_status | Admin release status of the SIG, can be 'live', 'internal' or 'admin' |
| rating_group | 'Yes' if the group does ratings, 'No' if not |
| recent_rating | The latest rating published by the group if not the last updated on Admin |
| recent_endorsements | The latest endorsement given by the group to election candidates if not the last updated on Admin |
| tracking_status | Determines whether or not the SIG should still be tracked, can be 'active', 'inactive' or 'new' |
| check_date | Date when it was last checked by SIGS researcher |
| check_by | Initials of SIGS researcher who last checked the SIG |
URL Status
- Working: When the URL is up and running and directs you to the SIG website
- Re-Direct: the URL redirects you to a website other than the SIG website
- Broken: the URL is gives you and error, takes too long to load or there is no URL
Tracking Status
- Active: the SIG remain in operation and we would still track it
- Inactive: the SIG may or may not be in operation and does not have any recent updates for a long period of time, typically after two terms of presidential election (7 years)
- New: new and recent SIG that has been added to the database
Columns that needs to be updated for every check: url_status, tracking_status, check_date, check_by. The other columns will automatically updated on a timely basis from a queried results on another spreadsheet.
CEC Tracking Sheet (Spreadsheet)
The purpose of this document is to track collected ratings and endorsements. The general rule of thumb is that if there are ratings or endorsements that needs entered, it will be recorded down on the CEC sheet. There are three parts to the CEC: Collection, Entry and Checks. The tables below illustrates when it is split into three different parts, in reality, they appear along side in one sheet.Collection
| column | description |
|---|---|
| span | Year or range of years the SIGS data covers in the file |
| state | State(s) that the SIGS data covers in the file |
| sig_id | Refers to the unique id of the SIG, important to have in case of discrepancy in name |
| sig_name | Preferably the name of the SIG as stated on Admin although sig_id will be referred to as a fail safe |
| data_type | The type of SIGS data either categorized as 'ratings' or 'endorsements' |
| date_collected | Date when SIGS data was last obtained by SIGS researcher |
| collected_by | Initials of SIGS researcher who obtained the SIGS data |
These columns should be filled after the collected item is finalized and ready to be stored. The person can choose anytime after the collection of the files to document but it is recommended to document all that had been collected by the end of their shifts.
Entry
| column | description |
|---|---|
| entry_id | Refers to the unique id of ratings or endorsements entry generated when initially entered onto admin |
| entry_method | The method that was used to enter the SIGS data onto Admin |
| date_entered | Date when SIGS data was entered onto Admin by SIGS researcher |
| entered_by | Initials of SIGS researcher who entered the SIGS data onto Admin |
These columns should be filled after an entry is completed on Admin and the database had generated their unique ID(s). The person who entered on Admin for the collected item should fill these columns as soon as possible to avoid work falling through the cracks.
Checks
| column | description |
|---|---|
| date_tagged | Date where entry was tagged by SIGS researcher |
| tagged_by | Initials of SIGS researcher who tagged the entry on Admin, only applies to ratings |
| date_webchecked | Date when entry was web-checked by SIGS researcher |
| webchecked_by | Initials of SIGS researcher who web-checked the entry |
Depending if there is a time constraint, tagging and checking will not be necessary immediately after entry and can be done by other person as a way of cross checking. However, the person who is responsible for entry could see the potential errors while the other person might not.
Columns in Collection, Entry and Checks were color coded based on the name of the section in the order that was mentioned above. They were arranged into three sections because they are each a distinct process and can be perform in conjunction or separately. It is also specifically design to increase the flexibility of our work flow. For example, collected ratings or endorsements does not need to be entered immediately after collection, but it can be entered at another time or by other researchers who did not collect it.
Organizing Files and Folders
Another aspect to documentation is the arrangement of file and folders. It is vital to keep a sound naming system and directory structure, having done so will assist in future retrievals and prevent the loss of file identity. All our files are currently stored on a network drive.
Naming System
The general rule to naming SIGs files is to provide necessary pieces of information that helps identify a file. These pieces of information are things like date, year, state abbreviation, group name etc. Some information are found to be consistent over the years of SIGs research and we are going to use these it to name SIGs files. In order for these pieces of information to present itself clearly, they will have to be arrange in a certain order such that the position of each piece matters. For example, year has to come before state abbreviation and state abbreviation has to come before the group's abbreviation and so on. Another reason to emphasize this is because it helps in locating the files easier both visually and logically. This is especially important in case of using year in the naming of files. Ratings and Endorsement files are typically produced on a yearly basis and should be first categorize as such. Sometimes even with distinct pieces of information, there can exist a duplicate of filenames, which is why in the naming structure you will see below can include additional info towards the end of the filename.These are some technical terms you need to know as part of the context:
| terms | description |
|---|---|
| Elements | these are the pieces of information as mentioned above that gives the files its identity |
| Position | refers to the position of the naming elements |
| Namespace | two or more elements of the same group |
To structure the elements of the filename, we will be using characters such as underscores ('_') and dashes ('-'). Underscores are used to delimit the elements, and dashes are used to denote two elements sharing the same namespace. The element subsequent to the dash is more specific than the element prior to the dash and so on. In some cases the naming element prior to the dash makes sense of the element subsequent to the dash.
Naming files for Ratings
There are typically four different types of files when creating an entry for ratings, especially when the process involves using the harvester. This four different types are the Ratings, Extract, Worksheet and Harvest file.General Structure:
[yearSpan]_[stateAbbreviation]_[groupAbbreviation]_[fileCategory]-[fileType]_[additionalInfo].[fileExtension]
| file category | file type | description | additional info | file extensions | examples |
|---|---|---|---|---|---|
| Ratings | not specified | Initial document that contains all the ratings information by the group. | House, Senate, (other office chambers/types), (numerical values) | pdf, html, ods | 2019-2020_IA_NFIB_Ratings.pdf |
| Extract | Contains extracted ratings information from the scorecard. | House, Senate, (other office chambers/types), (numerical values) | csv | 2019-2020_IA_NFIB_Ratings-Extract.csv | |
| Worksheet | Data from extract file is cleaned, re-modeled, matched and translated; shows all your workings. | matched, (type of issue), (numerical values) | ods, xlsx | 2019-2020_IA_NFIB_Ratings-Worksheet.ods | |
| Harvest | Contains data that is readable by the harvester and ready to be uploaded onto the database. | Lifetime, (type of issue) | csv | 2019-2020_IA_NFIB_Ratings-Harvest.csv |
Note: The original file for ratings will not have an extension to its file type.
Naming files for Endorsements
There are two file categories for endorsement, single and multiple endorsements. A single endorsement file would typically have more details in their name for easier identification, whereas a multiple endorsement file would typically contain a list of endorsements with multiple offices.General Structure:
[yearSpan]_[stateAbbreviation]_[groupAbbreviation]_[fileCategory]_[officeType]-[additionalInfo].[fileExtension]
| file category | description | office type | additional info | examples |
|---|---|---|---|---|
| Endorsement | Contains an individual or a single endorsement | (required) | lastname | 2016_NY_NFIB_Endorsement_Gubernatorial-Cuomo, 2017_NE_NFIB_Endorsement_Legislative-NE-07-Carrell |
| Endorsements | Contains a list or more than one endorsement | (not required) | primary, general, (state abbreviation), (date-YYYYMMDD), (numerical values) | 2018_NY_NFIB_Endorsements_Legislative-primary, 2018_NA_NFIB_Endorsements_Congressional-IA |
The following table shows the common types of offices and how it should appear in the [officeType] element:
| type of office | filename [officeType] |
|---|---|
| Presidential | Presidential |
| Congressional | Congressional |
| Gubernatorial | Gubernatorial |
| Statewide | Statewide |
| State Legislative | Legislative |
| State Judicial | Judicial |
| Special | [officeType]-[stateAbbreviation]-[districtNumber] |
Directory Structure
SIGS in the VoteSmart database is separated into two categories, national and state groups. Under the national SIGS, there are no more categories but only the groups themselves and all of them are unique. On the other hand, state SIGS are categorized by the state abbreviation. Within each SIGS, there are two main groups: ratings and endorsements. Both groups contains files that are typically at a yearly interval with possibility of span of 2 years. Depending on whether or not if the ratings and endorsements are collected for that year, a folder corresponding to that year will be created.So to put this into perspective, the directory structure will look like this:
Directory structure for National Groups:
National Groups --> (Name of SIG) --> Ratings/Endorsements --> (Year or Span) --> (Files within that year)
Directory structure for State Groups:
State Groups --> (State Abbreviation) --> (Name of SIG) --> Ratings/Endorsements --> (Year/Span) --> (Files within that year)
In some cases, national or even state groups contains the endorsements or ratings of specific states. A somewhat typical scenario for national groups are endorsements and ratings for state level candidates. This would meant that there is a possibility of multiple state level files in the national SIG. Folders named with state abbreviations is used to reduce the clutter of files within the same year. It will look like this:
... --> (Year/Span) --> (State Abbreviation) --> (files in that state and year)
Note: The state abbreviation for candidates on the national level is 'NA'.