.csv
ending, although we changed it to .txt
so you could see what it actually looks like. (Files with a .csv
ending are automatically opened up in Google Spreadsheets by Chrome, so you can't see the file contents directly.) The stuff you're looking at is difficult to read, but if you look closely, you can you see how it correlates to the XML data you saw on the website. The TAs removed some of the excess data fields before creating the CSV file, so not every field from the XML is present in the CSV file.We now have a CSV file that encodes senate data. We want to get this data into a spreadsheet.
Make a spreadsheet and go to File > Import > Upload
.
Pick your Senate data file. You will be prompted to choose how the file will be imported. Select Insert new sheet(s)
as your import action, comma
as the separator character, and click import
. A new sheet should open up containing data from your CSV file.
=importXML(B3, "//member_full")
. In a moment or two, the rest of the column should be filled with senator names. importXML
expression?
File > Make a Copy
. This will copy it to your drive, where you can move it to a sensible location later (also, be ready to wait, this might take a few minutes).session
vote_number
vote_question_text
member_full
vote_cast
vote_question_text
column. Some of the cells in this column contain the words "On Passage of the Bill..." We want to keep only the rows with similar text in the vote_question_text
column. Hint: this should only take a few minutes; it does not involve deleting the non-bill-vote rows one by one. Also, it's possible to alphabetize rows. Hint 2: If you've learned elsewhere about "Filter"s for data, please don't use them; the results may not work well with later steps! Hint 3: You can select a range of rows by selecting top row with the Shift
key held down, and then select the bottom row with the Shift
key still held down.vote_id
column in cell F1
. The spreadsheet automatically infers that this is a new part of the table. In row 2 of the vote_id
column, write a formula to combine the “session” for this row with the “vote_number” for this row, placing a colon between them. Use Fill down
to apply this formula to all the other rows.
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
1 | session | vote_number | vote_question_text | member_full | vote_cast | vote_id | 2 | 1 | 19 | On Passage... | Akaka (D-HI) | Yea | 1:19 | 3 | 1 | 19 | On Passage... | Alexander (R-TN) | Yea | 1:19 | 4 | ... | ... | ... | ... | ... | ... |
vote_id
, and title it numerical_vote
. In the second row of that column, enter a formula that produces 1
if a senator voted “Nay,” and a 2
if s/he voted “Yea.”, and zero for anything else. Verify that nothing went wrong and that no cell contains the word “ERROR.” Hint: Try writing an expression that uses multiple IF
functions "nested" within each other. The table should now look like this:
A | B | C | D | E | F | G | |
---|---|---|---|---|---|---|---|
1 | session | vote_number | vote_question_text | member_full | vote_cast | vote_id | numerical_vote | 2 | 1 | 19 | On Passage... | Akaka (D-HI) | Yea | 1:19 | 2 | 3 | 1 | 19 | On Passage... | Alexander (R-TN) | Yea | 1:19 | 2 | 4 | ... | ... | ... | ... | ... | ... | ... |
Data > Pivot Table Report
from the toolbar. This will open up a new sheet for a pivot table based on the information you selected. If you don't remember how to use pivot tables, review the tutorial.member_full
, the column labels be vote_id
, and the value in the cells be the numerical_vote
that you just created.Summarize by:
menu. Notice just how many different ways you can sort your values - you'll probably be using more of these options later. Grand Total
row and column by unchecking Show totals
in the rows and columns boxes