Activity 1-1

September 15, 2015

Task 0: Get the Files We Need

  1. Visit this web page for a U.S. Senate vote record. Note the three pieces of information from the URL (what are they again?). Left-click on the XML link in the upper right corner of the page to see the formatted data behind this website. We'll soon teach you something about importing data like this into a spreadsheet. Google Spreadsheets has a function, "ImportXML", for doing this, and Excel has something similar.

    If we were using Excel, you'd first have to download the data from the link above as an XML file and then convert it to CSV. We're first going to teach you how to import a CSV file into Google Spreadsheets from your desktop and then we'll teach you how to import data directly from the internet.

    The TAs converted the data into something called CSV (Comma Separated Values) format. Open this formatted file by left-clicking it. What you're looking at is data in CSV format. Files saved like this should normally be named with a .csv ending, although we changed it to .txt so you could see what it actually looks like. (Files with a .csv ending are automatically opened up in Google Spreadsheets by Chrome, so you can't see the file contents directly.) The stuff you're looking at is difficult to read, but if you look closely, you can you see how it correlates to the XML data you saw on the website. The TAs removed some of the excess data fields before creating the CSV file, so not every field from the XML is present in the CSV file.
  2. Click the download icon in the center of the toolbar at the top of the document, and download it to your desktop

Task 1: Import a CSV File into Google Spreadsheets

We now have a CSV file that encodes senate data. We want to get this data into a spreadsheet. Make a spreadsheet and go to File > Import > Upload. Pick your Senate data file. You will be prompted to choose how the file will be imported. Select Insert new sheet(s) as your import action, comma as the separator character, and click import. A new sheet should open up containing data from your CSV file.

Questions

  1. Look at the table. How many rows are there?
  2. Examine all the columns. How are these related to the XML/CSV file itself?
  3. Which columns contain useful information?
  4. Suppose you also had the CSV file for an additional session of Congress. How would you add the information in that file to this table?

Task 2: Import data from an online XML file

Suppose you didn't have access to a handy CSV file. We'll now teach you how to import data directly from the internet.
  1. Visit this web page for a U.S. Senate vote record in XML.
  2. Create a new sheet in the spreadsheet from Task 1, and in cell B3 enter the URL of that voting record. In cell B2, enter "URL:" as a label for B3.
  3. In cell D3, enter =importXML(B3, "//member_full"). In a moment or two, the rest of the column should be filled with senator names.
  4. Now in cell E3, enter something that will import each senator's vote. Check, for the first senator, that the vote is really correct, by looking at the XML document.
  5. In cell F3, enter something that will import the text of the item being voted on.

Questions

  1. Look at the table of names and votes. How many rows are there?
  2. Can you think of a way that you could have found out the number of rows before the senator names were loaded?
  3. We had you enter the URL in one cell, and then refer to it from other cells. Change the URL to update the spreadsheet to display vote 2 of the senate rather than the latest vote. Would this have been harder if you'd put the URL directly into each importXML expression?

Task 3: Format the Data

  1. Instead of having you load in all the votes from the 113th congress, we've done that for you.
  2. Copy this spreadsheet to your drive by left-clicking it to open, then selecting File > Make a Copy. This will copy it to your drive, where you can move it to a sensible location later (also, be ready to wait, this might take a few minutes).
  3. Once you have your own copy, get familiar with the data. How many rows are there? How many votes are in the first session? How many votes in the second session?
  4. Spend some time deleting columns from the data. We'll only need the following columns:
    • session
    • vote_number
    • vote_question_text
    • member_full
    • vote_cast
  5. Save the reduced spreadsheet with a different, descriptive name (maybe use the word "trimmed").
  6. Now, figure out how to delete all the rows of the spreadsheet that correspond to votes that are not on the passage of a bill. How can we tell which rows correspond to votes on the passage of a bill? Take a look at the vote_question_text column. Some of the cells in this column contain the words "On Passage of the Bill..." We want to keep only the rows with similar text in the vote_question_text column. Hint: this should only take a few minutes; it does not involve deleting the non-bill-vote rows one by one. Also, it's possible to alphabetize rows. Hint 2: If you've learned elsewhere about "Filter"s for data, please don't use them; the results may not work well with later steps! Hint 3: You can select a range of rows by selecting top row with the Shift key held down, and then select the bottom row with the Shift key still held down.
  7. The resulting spreadsheet should be a lot smaller — how many rows are there now?
  8. We want a unique identifier for the vote of each bill in this congress. Which two columns together produce a unique identifier for each vote?
  9. Add another column to the table by entering vote_id column in cell F1. The spreadsheet automatically infers that this is a new part of the table. In row 2 of the vote_id column, write a formula to combine the “session” for this row with the “vote_number” for this row, placing a colon between them. Use Fill down to apply this formula to all the other rows.
    ABCDEF
    1sessionvote_numbervote_question_textmember_fullvote_castvote_id
    2119On Passage...Akaka (D-HI)Yea1:19
    3119On Passage...Alexander (R-TN)Yea1:19
    4..................
  10. Right click the last column to add another column to the table, just to the right of vote_id, and title it numerical_vote. In the second row of that column, enter a formula that produces 1 if a senator voted “Nay,” and a 2 if s/he voted “Yea.”, and zero for anything else. Verify that nothing went wrong and that no cell contains the word “ERROR.” Hint: Try writing an expression that uses multiple IF functions "nested" within each other. The table should now look like this:
    ABCDEFG
    1sessionvote_numbervote_question_textmember_fullvote_castvote_idnumerical_vote
    2119On Passage...Akaka (D-HI)Yea1:192
    3119On Passage...Alexander (R-TN)Yea1:192
    4.....................

Task 4: Summarize the Data with a Pivot Table

  1. Select the entire table and select Data > Pivot Table Report from the toolbar. This will open up a new sheet for a pivot table based on the information you selected. If you don't remember how to use pivot tables, review the tutorial.
  2. Let the row labels be member_full, the column labels be vote_id, and the value in the cells be the numerical_vote that you just created.
  3. Change the value in the cells to be the maximum rather than the sum of the votes by selecting the right value from the Summarize by: menu. Notice just how many different ways you can sort your values - you'll probably be using more of these options later.
  4. Delete the Grand Total row and column by unchecking Show totals in the rows and columns boxes