<- Back to SIGS Index

Documentation

Documenting your work is an important step as you go through the process of collection and entry. It enables staff and interns to keep track and be held accountable for the work that they have done. Work that had not been documented will easily fall through the cracks and it could cause work duplicates, mis-entry and other unforeseen errors, in other words, it is not going to be efficient. The other purpose of documentation is to create evidence, the presence of evidence supports our organization's integrity and also serve as a reference for future work and archival purposes.

The few aspects of documentation includes tracking, creating proper filenames and organizing files and folders. The different documenting aspects can be performed in any part of the research process. For example, one could organize and rename the files after entering the data onto admin or one could update the tracking document at the end of the day instead of doing it on the go, it all depends whichever way feels most comfortable. Though flexible, it is time sensitive and it is highly recommended at least that one documented their work at the end of the day or at the end of one's shift

Index

Documentation

Tracking

Sweep Sheet (Spreadsheet)

CEC Tracking Sheet (Spreadsheet)

Organizing Files and Folders

Naming System

Naming files for Ratings

General Structure:

Naming files for Endorsements

General Structure:

Directory Structure

Tracking

Tracking in this sense not only to track our work but also serves to evaluate the progress of SIGS as a whole. The first document is called the 'Sweep Sheet' which is a reusable and perpetual document where updates overwrite the older ones but only after the new update is issued. The second document is called the 'CEC Sheet' where 'CEC' stands for "Collection, Entry and Checks", this is a fixed document where it is renewed every year. Updates to the 'CEC sheet' are static and will append new updates instead of overwriting the old ones.

Sweep Sheet (Spreadsheet)

The purpose of this tracking document is to track the status and most recent updates to every SIG in our database. It contains the necessary columns that is vital to the process of collection.

Below is the skeleton for a 'Sweep Sheet':

column	description
sig_id	Refers to the unique id of the SIG, important to have in case of discrepancy in name
sig_name	Name of the SIG as on Admin
url	Web address of the SIG's website, although it could sometimes be the SIG's social media site if web address not given
url_status	Show the availability of the SIG's web address, can be 'working', 'broken' or 'redirect'
release_status	Admin release status of the SIG, can be 'live', 'internal' or 'admin'
rating_group	'Yes' if the group does ratings, 'No' if not
recent_rating	The latest rating published by the group if not the last updated on Admin
recent_endorsements	The latest endorsement given by the group to election candidates if not the last updated on Admin
tracking_status	Determines whether or not the SIG should still be tracked, can be 'active', 'inactive' or 'new'
check_date	Date when it was last checked by SIGS researcher
check_by	Initials of SIGS researcher who last checked the SIG

URL Status

Working: When the URL is up and running and directs you to the SIG website
Re-Direct: the URL redirects you to a website other than the SIG website
Broken: the URL is gives you and error, takes too long to load or there is no URL

Tracking Status

Active: the SIG remain in operation and we would still track it
Inactive: the SIG may or may not be in operation and does not have any recent updates for a long period of time, typically after two terms of presidential election (7 years)
New: new and recent SIG that has been added to the database

Columns that needs to be updated for every check: url_status, tracking_status, check_date, check_by. The other columns will automatically updated on a timely basis from a queried results on another spreadsheet.

CEC Tracking Sheet (Spreadsheet)

The purpose of this document is to track collected ratings and endorsements. The general rule of thumb is that if there are ratings or endorsements that needs entered, it will be recorded down on the CEC sheet. There are three parts to the CEC: Collection, Entry and Checks. The tables below illustrates when it is split into three different parts, in reality, they appear along side in one sheet.

Collection

column	description
span	Year or range of years the SIGS data covers in the file
state	State(s) that the SIGS data covers in the file
sig_id	Refers to the unique id of the SIG, important to have in case of discrepancy in name
sig_name	Preferably the name of the SIG as stated on Admin although sig_id will be referred to as a fail safe
data_type	The type of SIGS data either categorized as 'ratings' or 'endorsements'
date_collected	Date when SIGS data was last obtained by SIGS researcher
collected_by	Initials of SIGS researcher who obtained the SIGS data

These columns should be filled after the collected item is finalized and ready to be stored. The person can choose anytime after the collection of the files to document but it is recommended to document all that had been collected by the end of their shifts.

Entry

column	description
entry_id	Refers to the unique id of ratings or endorsements entry generated when initially entered onto admin
entry_method	The method that was used to enter the SIGS data onto Admin
date_entered	Date when SIGS data was entered onto Admin by SIGS researcher
entered_by	Initials of SIGS researcher who entered the SIGS data onto Admin

These columns should be filled after an entry is completed on Admin and the database had generated their unique ID(s). The person who entered on Admin for the collected item should fill these columns as soon as possible to avoid work falling through the cracks.

Checks

column	description
date_tagged	Date where entry was tagged by SIGS researcher
tagged_by	Initials of SIGS researcher who tagged the entry on Admin, only applies to ratings
date_webchecked	Date when entry was web-checked by SIGS researcher
webchecked_by	Initials of SIGS researcher who web-checked the entry

Depending if there is a time constraint, tagging and checking will not be necessary immediately after entry and can be done by other person as a way of cross checking. However, the person who is responsible for entry could see the potential errors while the other person might not.

Columns in Collection, Entry and Checks were color coded based on the name of the section in the order that was mentioned above. They were arranged into three sections because they are each a distinct process and can be perform in conjunction or separately. It is also specifically design to increase the flexibility of our work flow. For example, collected ratings or endorsements does not need to be entered immediately after collection, but it can be entered at another time or by other researchers who did not collect it.

Organizing Files and Folders

Another aspect to documentation is the arrangement of file and folders. It is vital to keep a sound naming system and directory structure, having done so will assist in future retrievals and prevent the loss of file identity. All our files are currently stored on a network drive.

Naming System

The general rule to naming SIGs files is to provide necessary pieces of information that helps identify a file. These pieces of information are things like date, year, state abbreviation, group name etc. Some information are found to be consistent over the years of SIGs research and we are going to use these it to name SIGs files. In order for these pieces of information to present itself clearly, they will have to be arrange in a certain order such that the position of each piece matters. For example, year has to come before state abbreviation and state abbreviation has to come before the group's abbreviation and so on. Another reason to emphasize this is because it helps in locating the files easier both visually and logically. This is especially important in case of using year in the naming of files. Ratings and Endorsement files are typically produced on a yearly basis and should be first categorize as such. Sometimes even with distinct pieces of information, there can exist a duplicate of filenames, which is why in the naming structure you will see below can include additional info towards the end of the filename.

These are some technical terms you need to know as part of the context:

terms	description
Elements	these are the pieces of information as mentioned above that gives the files its identity
Position	refers to the position of the naming elements
Namespace	two or more elements of the same group

Note: 'Namespace' in this case shares the same meaning in essence with namespaces that are used in computing.

To structure the elements of the filename, we will be using characters such as underscores ('_') and dashes ('-'). Underscores are used to delimit the elements, and dashes are used to denote two elements sharing the same namespace. The element subsequent to the dash is more specific than the element prior to the dash and so on. In some cases the naming element prior to the dash makes sense of the element subsequent to the dash.

Naming files for Ratings

There are typically four different types of files when creating an entry for ratings, especially when the process involves using the harvester. This four different types are the Ratings, Extract, Worksheet and Harvest file.

General Structure:

[yearSpan]_[stateAbbreviation]_[groupAbbreviation]_[fileCategory]-[fileType]_[additionalInfo].[fileExtension]

file category	file type	description	additional info	file extensions	examples
Ratings	not specified	Initial document that contains all the ratings information by the group.	House, Senate, (other office chambers/types), (numerical values)	pdf, html, ods	2019-2020_IA_NFIB_Ratings.pdf
	Extract	Contains extracted ratings information from the scorecard.	House, Senate, (other office chambers/types), (numerical values)	csv	2019-2020_IA_NFIB_Ratings-Extract.csv
	Worksheet	Data from extract file is cleaned, re-modeled, matched and translated; shows all your workings.	matched, (type of issue), (numerical values)	ods, xlsx	2019-2020_IA_NFIB_Ratings-Worksheet.ods
	Harvest	Contains data that is readable by the harvester and ready to be uploaded onto the database.	Lifetime, (type of issue)	csv	2019-2020_IA_NFIB_Ratings-Harvest.csv

Note: The original file for ratings will not have an extension to its file type.

Naming files for Endorsements

There are two file categories for endorsement, single and multiple endorsements. A single endorsement file would typically have more details in their name for easier identification, whereas a multiple endorsement file would typically contain a list of endorsements with multiple offices.

General Structure:

[yearSpan]_[stateAbbreviation]_[groupAbbreviation]_[fileCategory]_[officeType]-[additionalInfo].[fileExtension]

file category	description	office type	additional info	examples
Endorsement	Contains an individual or a single endorsement	(required)	lastname	2016_NY_NFIB_Endorsement_Gubernatorial-Cuomo, 2017_NE_NFIB_Endorsement_Legislative-NE-07-Carrell
Endorsements	Contains a list or more than one endorsement	(not required)	primary, general, (state abbreviation), (date-YYYYMMDD), (numerical values)	2018_NY_NFIB_Endorsements_Legislative-primary, 2018_NA_NFIB_Endorsements_Congressional-IA

The following table shows the common types of offices and how it should appear in the [officeType] element:

type of office	filename [officeType]
Presidential	Presidential
Congressional	Congressional
Gubernatorial	Gubernatorial
Statewide	Statewide
State Legislative	Legislative
State Judicial	Judicial
Special	[officeType]-[stateAbbreviation]-[districtNumber]

Directory Structure

SIGS in the VoteSmart database is separated into two categories, national and state groups. Under the national SIGS, there are no more categories but only the groups themselves and all of them are unique. On the other hand, state SIGS are categorized by the state abbreviation. Within each SIGS, there are two main groups: ratings and endorsements. Both groups contains files that are typically at a yearly interval with possibility of span of 2 years. Depending on whether or not if the ratings and endorsements are collected for that year, a folder corresponding to that year will be created.

So to put this into perspective, the directory structure will look like this:

Directory structure for National Groups:

National Groups --> (Name of SIG) --> Ratings/Endorsements --> (Year or Span) --> (Files within that year)

Directory structure for State Groups:

State Groups --> (State Abbreviation) --> (Name of SIG) --> Ratings/Endorsements --> (Year/Span) --> (Files within that year)

In some cases, national or even state groups contains the endorsements or ratings of specific states. A somewhat typical scenario for national groups are endorsements and ratings for state level candidates. This would meant that there is a possibility of multiple state level files in the national SIG. Folders named with state abbreviations is used to reduce the clutter of files within the same year. It will look like this:

... --> (Year/Span) --> (State Abbreviation) --> (files in that state and year)

Note: The state abbreviation for candidates on the national level is 'NA'.

PVSWiki : SIGsDocumentation