Revision [11143]
This is an old revision of BreakTranscript made by LeishaD on 2014-07-01 11:59:55.
Speech Error Queries
This is a list of queries to be run from time to time to analyze the consistency of our data, i.e. to correct errors and standardize formatting. Notice that the SELECT, FROM, and ORDER BY clauses are all roughly the same. The goal is to display specific information in a specific order so that we can systematically identify and correct public statements containing errors.
Bear in mind some of these queries will return intimidatingly large numbers. Beware of false positives. Out of almost 800,000 speeches, errors are to be expected. Depending on the error, it may be possible to give the query to IT who can fix some errors automatically. Others will have to be done manually.
Pledges - A query that finds speeches with the type: "pledge":
SELECT s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) WHERE s.speechtype_id = 11 GROUP BY s.speech_id ORDER BY s.title
Thomas - A query that finds speeches with Page or Time breaks:
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) WHERE s.speechtext LIKE '%[Page:%' OR s.speechtext LIKE '%[Time:%' ORDER BY c.candidate_id, s.speechdate
Thomas - A query that finds the thomas speeches with incorrect URLs:
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) WHERE s.url LIKE '%homas.%' AND s.url NOT LIKE '%vote%' AND s.url NOT LIKE '%juan%' AND s.url NOT LIKE '%governor%' AND s.url != 'http://thomas.loc.gov' ORDER BY s.url, s.speechdate
Thomas - A query that finds the speeches with thomas.loc.gov URLs, but do not have the type Floor Speech
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) WHERE s.url LIKE '%homas.%' AND s.url NOT LIKE '%vote%' AND s.url NOT LIKE '%juan%' AND s.url NOT LIKE '%governor%' AND s.speechtype_id != 14 ORDER BY s.url
Thomas - A query that finds the speeches with thomas.loc.gov URLs, but do not have Washington, DC as the location:
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc USING (speech_id) JOIN candidate c USING (candidate_id) WHERE s.url LIKE '%thomas.loc.gov%' AND s.location != 'Washington, DC' ORDER BY s.location
Hearing - A query that finds speeches of type hearing, but the title does not reflect our formatting rules: Resolved 10/17/2013
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) WHERE title NOT LIKE '%Hearing of the%' AND speechtype_id = 13 ORDER BY c.candidate_id, s.speechdate
Hearing - A query that finds speeches with Hearing of the in the title, but the type is not hearing:
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) WHERE title LIKE '%Hearing of the%' AND speechtype_id != 13 ORDER BY c.candidate_id, s.speechdate
Interview - A query that finds the speeches of type “interviewâ€, but the title does not reflect our formatting rules:
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) WHERE s.title NOT LIKE '%Transcript%' AND s.speechtype_id = 3 ORDER BY c.candidate_id, s.speechdate
Interview - A query that finds the speeches with Transcript in the title, but the type is not interview:
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) WHERE s.title LIKE '%Transcript%' AND s.speechtype_id != 3 ORDER BY c.candidate_id, s.speechdate
Letter - A query that finds speeches of type letter, but the the title does not reflect our formatting rules:
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) WHERE s.title NOT LIKE 'Letter to%' AND s.speechtype_id = 2 ORDER BY c.candidate_id, s.speechdate
Letter - A query that finds speeches where the title includes the words letter to, but the type is not letter:
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) WHERE s.title LIKE 'Letter to%' AND s.speechtype_id != 2 ORDER BY c.candidate_id, s.speechdate
Issue Position - A query that finds speeches of type Issue Position, but the title does not reflect our formatting rules:
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) WHERE s.speechtype_id = 7 AND s.title NOT LIKE '%Issue Position:%' ORDER BY c.candidate_id, s.speechdate
Issue Position - A query that finds speeches where the title includes the words Issue Position, but the type is not “Issue Positionâ€:
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) WHERE s.speechtype_id != 7 AND s.title LIKE '%Issue Position:%' ORDER BY c.candidate_id, s.speechdate
Washington, DC - A query that finds some of the incorrect variations of Washington, DC in the speeches table:
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) WHERE s.location LIKE '%ashingto%' AND s.location LIKE '%D%' AND s.location != 'Washington, DC' ORDER BY s.location, c.candidate_id, s.speechdate
Spanish - A query that finds speeches which include the words prensa, declaracion, discurso that is, probably written in Spanish: Alex Coffman went through the results and fixed 95% of the errors found by this query.
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) WHERE s.speechtext LIKE '%discurso%' OR s.speechtext LIKE '%prensa%' OR s.speechtext LIKE '%declaracion%' ORDER BY c.candidate_id, s.speechdate
A query that finds speeches where click here (regardless of capitalization) appears in the speechtext:
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) WHERE s.speechtext ILIKE '%click here%' ORDER BY c.candidate_id, s.speechdate
A query that finds speeches lacking a URL:
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) WHERE s.url = '' ORDER BY s.speechdate, c.candidate_id
A query that finds speeches lacking a speechdate:
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) WHERE s.speechdate IS NULL AND s.speechtype_id != 7 ORDER BY c.candidate_id
A query that finds speeches by governors between Jan 01 2013 and March 15 2013 (in search of State of the State speeches):
SELECT c.candidate_id, s.speech_id, s.speechdate, s.title, s.url, s.speechtype_id, s.location FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id) JOIN office_candidate oc ON (oc.candidate_id = c.candidate_id) JOIN office o ON (o.office_id = oc.office_id) WHERE o.officetype_id = 'G' AND s.speechtype_id = 1 AND s.speechdate BETWEEN '2013-01-01' AND '2013-03-15' ORDER BY s.location
SpeechesGuide
Useful Research Queries
Speech Research Queries