Homework 1-3

Due September 22, 2015, 9:00 am

Reminders

If a problem is marked as “(Independent)”, you may only discuss the problem with course staff. Otherwise, you are free to discuss the concepts that will help you solve the problems with classmates as well as course staff. However, you are never allowed to simply copy answers.

Other notes

For this following task, you will be sharing a spreadsheet with the handin account - cs0931handinfall2015@gmail.com .

This is a long assignment, but that's mostly due to the amount that you have to read. For each problem, there's not too much you have to do, so don't be put off by it.

Setup

To start off, copy this spreadsheet into your google drive and rename it FirstLast_HW1-3.

Task 1: Why the "measure from two ends" trick doesn't work

In class, we've talked about measuring a similarity or distance between a pair of senators. That distance is abstract. It represents how far apart the two senators are in the "space" of voting records. Now we'll consider a more concrete example in order to think about these distances.

  1. Look at the above diagram of a baseball diamond. The catcher, James, is standing slightly to the third-base side of home plate. Rank all the players by order of increasing distance from James; get out a ruler and use it on your screen or printout if you need. The first two in your list should be James himself (he's distance zero from James) and then Bryant, the pitcher, who is about 90 feet away from him. To measure distance, use the distance between each players nose. Write your answers in the first column of the "Baseball" tab.
  2. Who is the farthest away from James? Let's say it's player X. Write down a list of all the players, ordered by increasing distance from X, in column B. Compare this order to the order from the previous question. Explain this relationship. Compare it to the train-station problem we did in class. If, as an example station, I had provided one of the stations at the end of the line, say, Boston, it would have been easy to place the remainder in order, and the order would be exactly the opposite (reading from the other end!) that you'd have gotten if I'd offered up the station at the far end of the line, Richmond. Why doesn't that happen with the baseball players? And do you suspect that this source of trouble will appear in the voting-records problem? Write your explanations in cell D12.

Task 2: How a spreadsheet should look

  • Look at the Bad Sheet tab. It shows the September prices for two items (Bread and Rice) and how many of each item we bought, and then the cost of the items, which is computed as D3*B3 + D4*B4, i.e., the number of breads times the price of bread, plus the number of rices times the price of rice. That total is shown in cell B6.

    Down below, we've got both September prices (copied from those above) and October prices, and a (correct) formula for the increase in price from September to October, which takes the total October cost, D3*C9 + D4*C10 and subtracts the total September cost, D3*B9 + D4*B10. Notice that we're using the costs from rows 9 and 10.

    Change the price of bread in cell B3 from $2.35 to $2.50. You'll see the September cost go up. But the cell labelled "Increase" (D9) won't change at all. Look at the formulas and explain why in cell E12.

  • Now look at the tab called Good Sheet. On this one, things are a little clearer: the only places the person looking at this sheet should change anything are highlighted in yellow (with a note at the top explaining this). The September prices in rows 9 and 10 are not copied from rows 3 and 4, but rather are defined by formulas that refer to rows 3 and 4: when the price of bread in B3 is changed, the price in B9 automatically updates. Note, too, that all dollar amounts have been formatted to display nicely. (That was true of the "Bad Sheet" tab as well.)

    The cells in this sheet can be divided into four classes:

    1. Labels: things that explain the contents of an adjacent cell or table-row or table-column.
    2. Parameters: the things marked in yellow, which might be changed by a viewer of the table.
    3. Computed values: All the numbers that aren't marked in yellow. They're values that are derived from the data by formulas.
    4. Data: There is no "data" in this example table except for the parameters. But in a table of senatorial votes, the "parameters" might be "which senate session" and "which vote", and the data would then be the votes of all the senators: it's information that you don't have to type in, but which can be found (on the wab, in a file, ...) by using the parameters. The distinction between "data" and "parameters" is a fine one. Thinking again of senate votes, you might want to experiment by changing which bill you're looking at, but you probably would not experiment by changing Senator Alexander's vote to a 'Nay' on some bill.
    That categorization should apply, whenever possible, to every spreadsheet you produce in this class.
  • Now look at the tab called BadBetterGood. On this one, the first two columns import Senator names as we did in class. The key formula, in cell B1, is
    =importXML("http://www.senate.gov/legislative/LIS/roll_call_votes/vote1132/vote_113_2_00008.xml", "//members/member/first_name") You can see that the legislative session, the congress number, and the vote number are all in the url, and the numbers "113" and "2" appear twice. Someone wanting to see a list of senators for the first session of the 110th congress would have to do some careful editing of the URL.

    Note that in this case, the senator names are "data", but there don't seem to be any parameters. To the degree that there are, they're hidden inside the importXML formula: the URL and the XPath.

    Columns D and E do a better job. Once again the data you can change is highlighted. There are several well-named entries (like "Which congress"), and even the item to extract has been separated out into its own field.

    There's a problem, though: the list of senators is missing.

    If you look at the "Goal" entry, I've written down the thing you should be trying to produce in the URL entry (for the 113th congress, second session, anyhow). The formula I've used for the URL doesn't produce that, as you can see.

    Modify the formula in E7 (you'll need several more ampersands and several more strings!) to make it exactly reproduce the "goal" URL. When you do, the Query results should appear.

    Change the "Which session:" to "1". Does the right thing happen? What about if you change the congress to "110"?

    The formula for the query is =importXML(E7,E5). That makes sense: E7 contains the URL, E5 contains the XPath. But we can do better.

  • Look at columns G and H. You'll need to fix the URL formula in H7, just as you did in columns D and E, but once you do, the results will show up, and will change when you change the "session" or "congress". But look at the formula in H8:
    =importXML(url,xpath)
    That code works because I selected cell H7, went to Data > Named Ranges and named it "url"; I did the same thing with cell H5, naming it "xpath". After doing that, I'm allowed to refer to those cells by name rather than by letter-number coordinates. It makes the formula a good deal easier to read.

    Change things so that instead of the member names being listed lower down in column H, their votes are listed. Was it easy?

  • Task 3

    Finally, we want to check in and make sure you feel you're becoming comfortable with the ideas needed to do a successful spreadsheet project of your own.

    1. Spend some time searching the web for data that is interesting to you. It can be in XML, CSV, a spreadsheet, or any other format you find on the web. Try thinking about questions you could ask and investigate about that data.

      Select the Task3 tab and list two or more sources of data you might be interested in exploring in your project. Describe in your own words each data set in a few words and provide a link. If you can't find sources yet, that is okay: instead, briefly describe two or more kinds of data you might be interested in exploring (e.g., votes, education, some kind of survey data). Write your answers in B1
    2. In cell B3, please rate how you feel the class is going so far:
      1. Too Slow
      2. Just about right
      3. Too fast
      4. Way too fast
    3. Also, tell us how long it took you to finish this homework in cell B5.

    Handin

    Share the following files with cs0931handinfall2015@gmail.com

  • FirstLast_HW1-3
  • In the above file name, "FirstLast" should be replaced with your first and last name. If it is not, we will take off points. Make sure every task has been completed.