Revision [9493]

This is an old revision of BreakTranscript made by MichaelM on 2013-07-23 14:45:00.

 

Speeches Error Queries


This is a list of queries to be run from time to time to analyze the consistency of our data, i.e. to correct errors and standardize formatting. Notice that the SELECT, FROM, and ORDER BY clauses are all roughly the same. The goal is to display specific information in a specific order so that we can systematically identify and correct public statements containing errors.

Bear in mind, some of these queries will return intimidatingly large numbers. Out of over 700,000 speeches, that is to be expected. Depending on the error, it may be possible to give the query to IT who can fix errors automatically. Others will have to be done manually. Beware of false positives.

Thomas - A query that finds speeches with “page” or “time” breaks:
SELECT c.candidate_id, c.lastname, s.speechdate, s.title
FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id)
WHERE s.speechtext LIKE '%[Page:%' OR s.speechtext LIKE '%[Time:%'
ORDER BY c.candidate_id, s.speechdate


Thomas - A query that finds the thomas speeches with incorrect URLs:
SELECT c.candidate_id, c.lastname, s.speechdate, s.title, s.url
FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id)
WHERE s.url LIKE '%thomas.loc.gov%' AND s.url != 'http://thomas.loc.gov' AND s.url != 'http://thomas.loc.gov/'
ORDER BY c.candidate_id, s.speechdate


Thomas - A query that finds the speeches with “thomas.loc.gov” URLs, but do not have the type “speech”:
SELECT c.candidate_id, c.lastname, s.speechdate, s.title, s.url, s.speechtype_id
FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id)
WHERE s.url LIKE '%thomas.loc.gov%' AND s.speechtype_id != 1
ORDER BY c.candidate_id, s.speechdate


Thomas - A query that finds the speeches with “thomas.loc.gov” URLs, but do not have “Washington, DC” as the location:
SELECT c.candidate_id, c.lastname, s.speechdate, s.url, s.location
FROM speech s JOIN speech_candidate sc USING (speech_id) JOIN candidate c USING (candidate_id)
WHERE s.url LIKE '%thomas.loc.gov%' AND s.location != 'Washington, DC'
ORDER BY c.candidate_id, s.speechdate


Thomas - A query that finds speeches where “thomas.loc.gov” appears in the speechtext:
SELECT c.candidate_id, c.lastname, s.speechdate, s.title
FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id)
WHERE s.speechtext LIKE '%thomas.loc.gov%'
ORDER BY c.candidate_id, s.speechdate


Hearing - A query that finds speeches of type “hearing”, but the title does not reflect our formatting rules:
SELECT c.candidate_id, c.lastname, s.speechdate, s.speechtype_id, s.title
FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id)
WHERE title NOT LIKE '%Hearing of the%' AND speechtype_id = 13
ORDER BY c.candidate_id, s.speechdate


Hearing - A query that finds speeches with “Hearing of the” in the title, but the type is not “hearing”:
SELECT c.candidate_id, c.lastname, s.speechdate, s.speechtype_id, s.title
FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id)
WHERE title LIKE '%Hearing of the%' AND speechtype_id != 13
ORDER BY c.candidate_id, s.speechdate


Interview - A query that finds the speeches of type “interview”, but the title does not reflect our formatting rules:
SELECT c.candidate_id, c.lastname, s.speechdate, s.speechtype_id, s.title
FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id)
WHERE s.title NOT LIKE '%Transcript%' AND s.speechtype_id = 3
ORDER BY c.candidate_id, s.speechdate


Interview - A query that finds the speeches with “Transcript” in the title, but the type is not “interview”:
SELECT c.candidate_id, c.lastname, s.speechdate, s.speechtype_id, s.title
FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id)
WHERE s.title LIKE '%Transcript%' AND s.speechtype_id != 3
ORDER BY c.candidate_id, s.speechdate


Letter - A query that finds speeches of type“letter”, but the the title does not reflect our formatting rules:
SELECT c.candidate_id, c.lastname, s.speechdate, s.speechtype_id, s.title
FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id)
WHERE s.title NOT LIKE '%Letter to%' AND s.speechtype_id = 2
ORDER BY c.candidate_id, s.speechdate


Letter - A query that finds speeches where the title includes the words “letter to”, but the type is not “letter”:
SELECT c.candidate_id, c.lastname, s.speechdate, s.speechtype_id, s.title
FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id)
WHERE s.title LIKE '%Letter to%' AND s.speechtype_id != 2
ORDER BY c.candidate_id, s.speechdate


Issue Position - A query that finds speeches of type “Issue Position”, but the title does not reflect our formatting rules:
SELECT c.candidate_id, c.lastname, s.speechdate, s.speechtype_id, s.title
FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id)
WHERE s.speechtype_id = 7 AND s.title NOT LIKE '%Issue Position:%'
ORDER BY c.candidate_id, s.speechdate


Issue Position - A query that finds speeches where the title includes the words “Issue Position”, but the type is not “Issues Position”:
SELECT c.candidate_id, c.lastname, s.speechdate, s.speechtype_id, s.title
FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id)
WHERE s.speechtype_id = 7 AND s.title NOT LIKE '%Issue Position:%'
ORDER BY c.candidate_id, s.speechdate


Washington, DC - A query that finds some of the incorrect variations of “Washington, DC” in the speeches table:
SELECT c.candidate_id, c.lastname, s.speechdate, s.location
FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id)
WHERE s.location LIKE '%ashingto%' AND s.location LIKE '%D%' AND s.location != 'Washington, DC'
ORDER BY s.location, c.candidate_id, s.speechdate


Speechdate - A query that finds some speeches with unusual “speechdates”:
SELECT c.candidate_id, c.lastname, s.speechdate, s.title
FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id)
WHERE s.speechdate < '2000-01-01'
ORDER BY c.candidate_id, s.speechdate


Spanish - A query that finds speeches which include the words “prensa, declaracion, discurso” that is, probably written in Spanish:
SELECT c.candidate_id, c.lastname, s.speechdate, s.title
FROM speech s JOIN speech_candidate sc ON (s.speech_id = sc.speech_id) JOIN candidate c ON (c.candidate_id = sc.candidate_id)
WHERE s.speechtext LIKE '%discurso%' OR s.speechtext LIKE '%prensa%' OR s.speechtext LIKE '%declaracion%'
ORDER BY c.candidate_id, s.speechdate
There are 2 comments on this page. [Show comments]
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki