Filtering and adding columns
Here’s our table from the last lecture:
include tables include shared-gdrive("cs111-2020.arr", "1imMXJxpNWFCUaawtzIJzPhbDuaLHtuDX") municipalities = table: name, city, population-2000, population-2010 row: "Providence", true, 173618, 178042 row: "Cranston", true, 79269, 80387 row: "Coventry", false, 33668, 35014 row: "Warwick", true, 85808, 82672, row: "North Providence", false, 32411, 32078 end
Last time, we wrote a function that takes a single row of the table and determines whether the population is decreasing:
fun population-decreased(r :: Row) -> Boolean: doc: "returns true if the municipality's population went down between '00 and '10" r["population-2010"] < r["population-2000"] end
We can use filter-with
to return a new table of just the rows where
population-decreased
evaluates to true
:
> filter-with(municipalities, population-decreased)
Notice that we’re passing a function as the second argument to
filter-with
. This function has to take in a Row
and return a Boolean
.
We can use this operation to get just the cities:
fun is-city(r :: Row) -> Boolean: doc: "get the value in the row's city column" r["city"] end > filter-with(municipalities, is-city)
And, we can combine all of these operations. How would we get the city with the smallest population?
> sort-by(filter-with(municipalities, is-city), "population-2010", true).row-n(0)
Population change
Let’s work on solving a problem using our RI municipalities data. Let’s say we’re trying to figure out what the fastest-growing towns are in RI. In other words, we want a table containing only towns, sorted by the percent change in population. How would we go about doing this? What are the subtasks of this problem?
- Filtering out the cities
- Calculating percentage change in population
- Building a column for percentage change
- Sorting on that column in descending order
See the lecture capture for details on how this program is built up. Here’s what we ended up with:
include tables include gdrive-sheets include image include shared-gdrive("cs111-2020.arr", "1imMXJxpNWFCUaawtzIJzPhbDuaLHtuDX") municipalities = table: name, city, population-2000, population-2010 row: "Providence", true, 173618, 178042 row: "Cranston", true, 79269, 80387 row: "Coventry", false, 33668, 35014 row: "Warwick", true, 85808, 82672 row: "North Providence", false, 32411, 32078 end fun is-town(r :: Row) -> Boolean: not(r["city"]) end fun percent-change(r :: Row) -> Number: (r["population-2010"] - r["population-2000"]) / r["population-2000"] end towns = filter-with(municipalities, is-town) towns-with-percent-change = build-column(towns, "percent-change", percent-change) fastest-growing-towns = sort-by(towns-with-percent-change, "percent-change", false) fastest-growing-towns
Adding columns
We can write functions on rows that don’t return Booleans
:
fun percent-change(r :: Row) -> Number: (r["population-2010"] - r["population-2000"]) / r["population-2000"] end
We can add columns to tables using another operation: build-column
.
build-column :: (t :: Table, colname :: String, builder :: (Row -> A)) -> Table
What’s going on in that signature? In particular, what’s the type of the
builder
parameter? builder
needs to be a function that takes a row and
returns…something! The values it returns will be the values in the new column
we’re adding to each row. Imagine we have the following table:
a | b |
---|---|
“dog” | 2 |
“cat” | 3 |
Let’s call this table t
. If we have a function builder
, then the value of
build-column(t, "c", builder)
is
a | b | c |
---|---|---|
“dog” | 2 | builder(<"dog", 2>) |
“cat” | 3 | builder(<"cat", 3>) |
For example, if we have:
fun builder(r :: Row) -> Number: string-length(row["a"]) + row["b"] end
Then we end up with the following table:
a | b | c |
---|---|---|
“dog” | 2 | 5 |
“cat” | 3 | 6 |
Loading Google Sheets into Pyret
Later in this course, we’ll be working with very large tables, which would be a real pain to type into Pyret by hand. I’ve been unwilling to do that even for the 39 municipalities in Rhode Island! Luckily, we don’t have to type tables manually into Pyret–instead, we can load them from Google Sheets!
I’ve made a spreadsheet version of our RI municipality data, which includes every town and city. We can load it into Pyret as follows:
include tables include gdrive-sheets include shared-gdrive("cs111-2020.arr", "1imMXJxpNWFCUaawtzIJzPhbDuaLHtuDX") ssid = "1jHvn5CPE6RkTTQRIXQbY5n5p4aiOH7fZsnwK2s6s6tc" spreadsheet = load-spreadsheet(ssid) municipalities = load-table: name :: String, city :: Boolean, population-2000 :: Number, population-2010 :: Number # true because the sheet has a "header" row source: spreadsheet.sheet-by-name("municipalities", true) end
We can then work with this table just as if we had entered it manually.