cs0931handinfall2015@gmail.com
. In the text file please label your answers clearly so that we know which answers go with which question.
Ranking is one way of understanding the distances between one senator and every other senator. Maybe we want to use this data to do other kinds of labeling or conditional formatting. To do that, we need to be fluent in writing formulas that use "conditions", like IF statements. In this task, we'll practice writing IF statements and other logical statements, like AND and OR.
HW1-4
into your Google Drive. Look at the formulas in cells C1
through C5
. In cell D1
, write a formula that's TRUE
if A1
is less than 5
or if both A2
and A3
are bigger than 2
, but is FALSE
otherwise.TRUE
if both A1
equals A2
and A3
does not equal A4
. You can use <>
to mean “not equals” (so that 5 <> 3
is TRUE
), or you can use the NOT
function.D3
write an expression whose value is 7
if either A2
or A3
is greater than 0, and is A5
otherwise.
FirstLast_HW1-4
.During the first days of this class, the TAs and I had to deal with the waitlist, for which we had student email addresses. But instead of a list of email addresses, we wanted to make up a nicely sorted class list. As you know, a typical Brown email address looks like John_Hughes@brown.edu
. We wanted to make this into something like "Hughes, John", or perhaps "Hughes" in one column of a spreadsheet and "John" in the next column, so that sorting would be easy.
On the tab Problem 2
, we've provided a list of seven brown emails; your job is to produce, on the tab "Problem 2 solution", a table of the names, sorted, where the first column is the last name (e.g. "Hughes"), the second is the first name ("John"), and the third is the email address ("John_Hughes@brown.edu"). The table must be sorted by last name and then further sorted by first name. The first row of your solution should therefore be
Balan | Andrew | Andrew_Balan@brown.edu
.
To produce this, you'll need to do some work. I'd suggest you look at the google spreadsheets "Split()" function. Your intermediate work can go in the marked area on the Problem 2 Solution
page (which you may extend to the right if you need it). Your final table should go in the area marked Solution
. HINT: The SPLIT function can split over multiple characters at the same time. For example, =Split("Data-and&stuff", "-&")
will produce Data | and | stuff
in 3 cells. Also, it is possible to enter two ranges as the first two arguments in the SORT function. If you do this, the spreadsheet will sort the FIRST range of data based on the SECOND range of data (assuming that the ranges are of equal length).
Your solution should have the property that if the TAs change the data on the Problem 2
tab, your solution will update appropriately. You might think that since sorting is needed, that's impossible. But look at the area on the Problem 2
page marked example
; you'll see that the data in the second column is just the data from the first column, sorted. Check out the formula in cell C12
to see how that was done. The key point is that you can do sorting using a formula rather than the data > sort
menu item.
To simplify matters a little, you may assume that the TAs will only change the data by replacing a name, NOT by adding or deleting names.
Now let's address the possibility that the amount of data in a sorting task might change. In the "Problem 3" tab, there's a list of numbers in the first column -- eleven of them -- and in the third column they appear sorted, these sorted values being produced by the formula =sort(A2:A13, 1, true)
in cell C2.
In cell B2
, write a formula that will tell how many data items there are in column 1. You may assume (i) The data are all together at the top, i.e., there are no blanks between the label "data" and the data items, and each data item is nonblank, so that the first blank entry in column A signals the end of the data. (ii) there are at most 40 data items, i.e., you only need to look at cells A1
through A41
.
You might want to use Countif
, or IsBlank
, or CountA
to do this. Make certain that you don't count the "Data" label as a data item! Your answer, for the data count, should be 11. (Don't just type 11: type a formula which looks at the data and produces a the number of data items, which happens to be 11.)
In cell C2
, change the formula to use the value from B2
to produce the sorted data, even if there happen to be more that 11 or fewer than 11 (but no more than 40!). You'll need to use OFFSET()
to produce the range of values to be sorted. Reading the documentation for "Offset" may not help a lot, so here's an example and an explanation:
In entry D2
, the formula is =max(OFFSET(A2, 0, 0, 5, 1))
. It produces the value 5, which is the maximum value in the first five entries of the data in column A. How does that work?
Well, first, MAX()
produces the maximum of any number of numbers, so =max(3, 5)
produces 5, while =max(A1, B2, C17)
produces the largest of the values in the three named cells. (If any of them are not numbers, it produces an error.) But MAX()
can also operate on a RANGE of cells, like A1:A5
, which means "all the cells between A1 and A5, inclusive." (In our case, that would be a problem, since A1
has the value "Data", which isn't a number!). A range can even be a block of cells, like A1:C5
.
Back to the formula in D2
: =max(OFFSET(A2, 0, 0, 5, 1))
. It's taking the max of a range, so the OFFSET()
must be producing a range. Here's what it's doing: it's making a range starting at cell A2
, but with a few subtleties. First, =OFFSET(A2, 1, 3)
would refer to cell D3
: it's offset from A2
by 1 down and 3 across. You've already seen this in an earlier homework. But a second variant of OFFSET()
lets you produce a range of cells that starts at the cell described by the first three arguments, and extends down and to the right, by amounts specified by the next two. So OFFSET(A2, 1, 3, 2, 5)
describes a range that starts with its upper left corner at D3 (just as above), but the resulting range is 2 rows deep and 5 columns wide. So the range is D3:H4. You might be thinking "It should be D3:I5", because "5" is two rows below 3, and I is five columns more than D. But the rule is that the last two items in the OFFSET(A2, 1, 3, 2, 5)
expression tell you how TALL and how WIDE the results are; to get something two rows tall, starting at D3
, we need to go to row 4, not row 5 and to column H, not column I.
Back to the formula we've actually got: =max(OFFSET(A2, 0, 0, 5, 1))
. It says to create a range from A2 (offset not at all: that's what the two zeroes mean!), and the range should be 5 rows tall and 1 column wide. So the "OFFSET" produces the range A2:A6
. The MAX then takes the maximum value in this set of numbers, which happens to be "5", and that's what's displayed in cell D2.
Now: you need to use OFFSET to make a range in this same way, but it should go from A2
down to A12
(unless you add a few more data items, in which case it needs to go further). Ask yourself, "Where will I get the last two items that go in the OFFSET expression?" How many columns do you want your range to have? How many rows? Once you've figured this out, you'll wrap your OFFSET
expression in a SORT()
and produce the sorted data.
Would your solution to problem 2 work if the data included some of the names shown in the Problem 4
tab of the spreadsheet instead of the ones we gave you in problem 2? Briefly explain why or why not in the space given. Don't go fix your solution to problem 2: instead, explain what assumptions you made (if any) that turn out not to be valid.
Now we're going to go over a little data importing!
Go to the ImportExamples
tab of the spreadsheet. At the top is an example of performing a google search and using importXML
to extract the data that was returned (i.e., all the web pages you could click on). The xpath part of the formula is more complex than anything we saw in class; it's
//h3[@class='r']/a/@href
Let's interpret that. The first part, //h3
, says to search for anything with the tag h3
, which is the style used to display results in a google search.
To see this, go to Google, search for "poodle", and when the results show up, note two things:
https://www.google.com/search?q=poodle
somewhere within it.h3
tag, with the attribute class="r"
...which exactly matches the first part of that Xpath: that first part says "look for h3 tags with the class attribute equal to 'r'". The second part says "and within there, look for 'a' tags, and tell me the value of the 'href' attribute." It turns out that "a" tags are hyperlinks, and the "href" attribute is the URL for the linked page. (Note: these instructions work on Google Chrome or Firefox. If you are using Safari, after you click "Inspect Element", you have to click on the source code button just over the results, and change it to "DOM Tree".) data-href
attribute, but we cannot access these because of a bug in Google Spreadsheets (it cannot correctly parse hypenated names). To get around this, we use substitute(cell name or string, target, replacement) to replace every instance of "/url?q=" with "", effectively getting rid of it. We also use substitute to get the right urls for image or news searches google returns, since those behave a bit differently than regular url results.
/a
onwards isn't needed, because you want the actual contents of this entity, not the value of some attribute.
In cell A1
, use the IMPORTDATA
command, using the following URL as the data source: "http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv"
B20
: we've used importHTML
, which is a lot like importXML
, except that it handles less structured webpages better (but does less with them). In this case, we've opened a webpage that lists incidents of piracy on the high seas, and said that we're looking for a TABLE within that page (we tried both "list" and "table", and "table" worked; we could have used the "inspect element" trick to save ourselves the trouble). We've said that we want the very first table (that's the "1" at the end of the formula), and the results show us several piracy incidents.
The sad thing is that it doesn't show as many as the webpage itself does. Web servers don't have to give the same result to Google Spreadsheets as they do to a browser, and this seems to be an example of that happening: we only get about 10 results.
Below that, in cell B34
, is a formula that should load up data from the national earthquake center...but their webpage's XML is broken, and Google's XML-reading program can't make sense of it.
Below that, in cell B36
, is a formula for reading the low temperatures listed on the weather page for zipcode 02912; I found the URL by going to weather.gov and entering "02912" in the search box. I then figured out what to search for (i.e., the Xpath) by using "inspect element".
Change this so that cell B36
shows the current temperature instead of starting a table of low temperatures.
Share the following file with cs0931handinfall2015@gmail.com
FirstLast_HW1-4
google spreadsheetIn the above file name, "FirstLast" should be replaced with your first and last name. If it is not, we will take off points. Make sure every task has been completed.