Activity 1-4

September 29, 2015

Task 1: Building a small similarity matrix

  1. Copy and save ACT1-4_starter to your Google Drive. Open it in your Google Drive.
  2. This sheet contains most of the work that you've done so far (except for the "compare with any senator" thing that we did in the last activity). We're going to use it to build a table that shows, for any pair of senators, how similar their records are. That means that we'll build a table where each row is labelled by a senator and each column is labelled by a senator, and the cell in the 12th row, 3rd column will tell us how much the 12th senator and the 3rd senator are alike.
    The spreadsheet has many tabs; using the first few, you'll solve the "how similar" problem for a very small senate: just four senators and five votes -- small enough that you can easily work out the similarity between two of them by hand. When you get the computations for this "small senate" all worked out, you can extend to working with more senators.
    We've done a few things to make your life simpler: The sheet called 4SenatorPivotTable will be the starting point for your work today. Open that tab and look at the data. Which two senators look most similar to you?
  3. Suppose we took the first two rows of the table -- Senators Alexander and Ayotte's votes -- and did a SUMPRODUCT with them. As discussed earlier, this would count the number of agreements minus the number of disagreements. What value do you get, doing this by hand? Check with your neighbor to be sure you agree.
  4. If we did the same thing with the absolute values of those two rows, we'd get the number of agreements PLUS the number of disagreements. What value do you get, doing this by hand?
  5. Because we're later going to want the sum you computed in the last problem, we're going to copy the pivot table, but replace the numbers with their absolute values. Do this in the tab called 4SAbsPivotTable.
  6. Now, in the 4SDifference tab, make a table whose row-labels and column-labels are both those from the pivot table (i.e., the rows are labelled by senator names, and so are the columns). Use FORMULAS to create these labels rather than copying and pasting. Hint: "Transpose" may help you.
  7. Now use an MMULT formula to fill in the agree-minus-disagree numbers for all pairs of senators in that table.
  8. On the 4SSum tab, make a similar table, with the same row and column labels, but with the entries being the number of agreements-plus-disagreements for each pair of senators. Since the ABS formula takes only one number or cell as input you have to use the ARRAYFORMULA function, as ARRAYFORMULA(ABS(range)).
  9. Pick an entry of this table at random, and compare the value in the SUM table to the value in the DIFFERENCE table. Which should be a larger number? Is it?
  10. Finally, in the 4SSimilarityTable tab, once again use senator-names for row and column labels, and for each senator-senator pair, put the ratio of agree-minus-disagree to agree-plus-disagree in the corresponding row and column. (Do this for the first senator-senator pair, i.e., Alexander-Alexander, and then use "fill" to do the others. Do you need to worry about absolute references versus relative references as you fill?)
  11. With your neighbor, calculate by hand what the three entries above the diagonal should be, and make sure that these are the answers you got.

Task 2: Building the big similarity matrix

  1. Repeat each of the tasks you just did for four senators, but do it for the much larger group of senators represented in the Data tab, in which we've removed all votes from Senators Booker, Chiesa, ..., who were not present for the entire senate session for one reason or another. The pivot table for the remaining senators is in the PivotTable tab, and the space you need for the subsequent computations is provided by other tabs with parallel names. Be sure, in each tab, to put the senators' names in the first row and column, and have the numbers start at location B2and extend through row 99 and column CU.
  2. When you're done, look at the SimilarityViz tab, where a copy of your table is shown, but with the cells colored by value. Dark green is strong agreement, dark red is strong disagreement. Explain why the diagonal is all dark green.
  3. The color-coding was done using a kind of crude conditional formatting: values greater than 0.8 were made dark green; those greater than 0.6 we a less dark green, and so on. Click on cell B2 and go to Format > Conditional Formatting to see how this was done. The order of the rules is important: if a cell satisfies some rule, then a color gets used and the rest of the rules are ignored. What would happen if we swapped the first two rules and the value in the cell was 0.9? What color would be used?
  4. This kind of conditional formatting is not very sophisticated. Google Spreadsheets have cleverer formatting rules, which we'll use to better understand the data. For now, just look at the data and colors and ask, "What does a dark green square that's NOT on the diagonal tell me? What about a dark red square?"