Lecture notes: More tables
Here’s our table from the last lecture:
include tables include shared-gdrive("cs111-2018.arr", "1XxbD-eg5BAYuufv6mLmEllyg28IR7HeX") municipalities = table: name, city, population-2000, population-2010 row: "Providence", true, 173618, 178042 row: "Cranston", true, 79269, 80387 row: "Coventry", false, 33668, 35014 row: "Warwick", true, 85808, 82672, row: "North Providence", false, 32411, 32078 end
Review of table operations
Let’s first review the operations we saw last time.
We can get the nth row of the data:
> municipalities.row-n(0) > municipalities.row-n(1) > municipalities.row-n(0)["name"]
We can sort the data by 2010 population:
> sort-by(municipalities, "population-2010", false)
To find the least populous municipalities, we can change the last argument:
> sort-by(municipalities, "population-2010", true)
We can also find all of the cities:
fun is-city(r :: Row) -> Boolean: doc: "get the value in the row's city column" r["city"] end > filter-by(municipalities, is-city)
Note what we’re doing here: we’re passing a function to another function. How is
is-city
used?
We can find municipalities whose population went down:
fun population-decreased(r :: Row) -> Boolean: doc: "returns true if the municipality's population went down between '00 and '10" r["population-2010"] < r["population-2000"] end > filter-by(municipalities, population-decreased)
And, we can combine all of these operations. How would we get the city with the smallest population?
> sort-by(filter-by(municipalities, is-city), "population-2010", true).row-n(0)
Adding columns
We can add columns to tables using another operation: build-column
.
build-column :: (t :: Table, colname :: String, builder :: (Row -> A)) -> Table
What’s going on in that signature? In particular, what’s the type of the
builder
parameter? builder
needs to be a function that takes a row and
returns…something! The values it returns will be the values in the new column
we’re adding to each row. Imagine we have the following table:
a | b |
---|---|
“dog” | 2 |
“cat” | 3 |
Let’s call this table t
. If we have a function builder
, then the value of
build-column(t, "c", builder)
is
a | b | c |
---|---|---|
“dog” | 2 | builder(<"dog", 2>) |
“cat” | 3 | builder(<"cat", 3>) |
For example, if we have:
fun builder(r :: Row) -> Number: string-length(row["a"]) + row["b"] end
Then we end up with the following table:
a | b | c |
---|---|---|
“dog” | 2 | 5 |
“cat” | 3 | 6 |
Population change
Let’s work on solving a problem using our RI municipalities data. Let’s say we’re trying to figure out what the fastest-growing towns are in RI. In other words, we want a table containing only towns, sorted by the percent change in population. How would we go about doing this? What are the subtasks of this problem?
- Filtering out the cities
- Calculating percentage change in population
- Building a column for percentage change
- Sorting on that column in descending order
See the lecture capture for details on how this program is built up. Here’s what we ended up with:
include tables include gdrive-sheets include image include shared-gdrive("cs111-2018.arr", "1XxbD-eg5BAYuufv6mLmEllyg28IR7HeX") municipalities = table: name, city, population-2000, population-2010 row: "Providence", true, 173618, 178042 row: "Cranston", true, 79269, 80387 row: "Coventry", false, 33668, 35014 row: "Warwick", true, 85808, 82672 row: "North Providence", false, 32411, 32078 end fun is-town(r :: Row) -> Boolean: not(r["city"]) end fun percent-change(r :: Row) -> Number: (r["population-2010"] - r["population-2000"]) / r["population-2000"] end towns = filter-by(municipalities, is-town) towns-with-percent-change = build-column(towns, "percent-change", percent-change) fastest-growing-towns = sort-by(towns-with-percent-change, "percent-change", false) fastest-growing-towns
Loading Google Sheets into Pyret
Later in this course, we’ll be working with very large tables, which would be a real pain to type into Pyret by hand. I’ve been unwilling to do that even for the 39 municipalities in Rhode Island! Luckily, we don’t have to type tables manually into Pyret–instead, we can load them from Google Sheets!
I’ve made a spreadsheet version of our RI municipality data, which includes every town and city. We can load it into Pyret as follows:
include tables include gdrive-sheets include shared-gdrive("cs111-2018.arr", "1XxbD-eg5BAYuufv6mLmEllyg28IR7HeX") ssid = "1jHvn5CPE6RkTTQRIXQbY5n5p4aiOH7fZsnwK2s6s6tc" spreadsheet = load-spreadsheet(ssid) municipalities = load-table: name :: String, city :: Boolean, population-2000 :: Number, population-2010 :: Number # true because the sheet has a "header" row source: spreadsheet.sheet-by-name("municipalities", true) end
We can then work with this table just as if we had entered it manually.