Managing Data Acquisition and Data Integrity
The General Process for Data Acquisition:
- Research and locate primary sources, or other reliable, approved sources with more up-to-date, comprehensive, or new content
- Check the sources against our database
- Identify discrepancies between our data and the original source
- Add to or modify our data to reflect the new content and match our formatting standards
- Enhance our data with analysis and digests (such as categorization and tagging; issue position determinations with VoteEasy; summaries of Key Votes)
- Check our modifications and ensure we are effectively representing our data on our various channels of Information Delivery
- Track and report on progress
The following sections represent key values of Project Vote Smart as it relates to our data. Also see: General Prioritization
Efficiency and Scalability:
Periodically examine the processes done and evaluate if they can be improved. Some things to consider:
- make it easier to monitor data sources or collect data in an automated fashion (i.e. scrapers)
- consider the possibilities of crowdsourcing content
- consider how we might improve usability in Admin (typically reported as feature requests in Mantis)
- discuss possibilities for importing bulk data with IT
Accuracy and Completeness:
- ensure everything gets the required 3 checks
- ensure that you are conducting updates of the data as appropriate. If your checks are yielding either very few or quite a bit of changes, you should work with your supervisor to adjust the frequency of the updates
- design PyQual tests to catch known problems and make revisions based on PyQual test results
- design and conduct regular queries to identify potential problems (such as possible duplicates)
- regularly spotcheck content of various types. Try to cover all possible variables in your spotcheck (for example, different jurisdictions, different offices, different people who worked on it, etc.)
- Compare this content to your source data to help identify issues- make sure you are accurately reflecting and interpreting the source data for your area of content
- make sure you and those under your supervision are abiding by all the policies and procedures established by the Administration. If you have questions or recommendations for improvement, please bring them to your direct supervisor for approval.
- consider how we might enforce greater accuracy and completeness through restrictions in admin (typically reported as feature requests in Mantis)
- initiate other cleanup projects as warranted, in consultation with your supervisor
- patrol for evidence of partisanship or bias
- consider the effectiveness of your tracking mechanisms
- follow up on all mantis requests related to your content area, ensuring completion of critical requests
Timeliness and Relevancy:
- monitor current events and controversies. Prioritize getting relevant content up ASAP
- generally prioritize by the size of constituency (for example, Presidential content comes first)
"The Best There Is"
- know the other organizations and businesses collecting similar and complementary data
- Keep tabs on new developments by joining relevant email lists, periodically reviewing their websites (especially blog posts and job listings)
- Review idea repositories, like the Knight News Challenge for inspiration
- if their data is as good or better than ours, lets figure out how we might get it and refocus our time on enhancing that data
- consider possible collaborations with other organizations and bring any ideas to your supervisor
CategoryManagement