Filtering and adding columns

Here’s our table from the last lecture:

include tables

include shared-gdrive("cs111-2020.arr", "1imMXJxpNWFCUaawtzIJzPhbDuaLHtuDX")

municipalities = table: name, city, population-2000, population-2010
  row: "Providence", true, 173618, 178042
  row: "Cranston", true, 79269, 80387
  row: "Coventry", false, 33668, 35014
  row: "Warwick", true, 85808, 82672,
  row: "North Providence", false, 32411, 32078
end

Last time, we wrote a function that takes a single row of the table and determines whether the population is decreasing:

fun population-decreased(r :: Row) -> Boolean:
  doc: "returns true if the municipality's population went down between '00 and '10"
  r["population-2010"] < r["population-2000"]
end

We can use filter-with to return a new table of just the rows where population-decreased evaluates to true:

> filter-with(municipalities, population-decreased)

Notice that we’re passing a function as the second argument to filter-with. This function has to take in a Row and return a Boolean.

We can use this operation to get just the cities:

fun is-city(r :: Row) -> Boolean:
  doc: "get the value in the row's city column"
  r["city"]
end

> filter-with(municipalities, is-city)

And, we can combine all of these operations. How would we get the city with the smallest population?

> sort-by(filter-with(municipalities, is-city), "population-2010", true).row-n(0)

Population change

Let’s work on solving a problem using our RI municipalities data. Let’s say we’re trying to figure out what the fastest-growing towns are in RI. In other words, we want a table containing only towns, sorted by the percent change in population. How would we go about doing this? What are the subtasks of this problem?

  • Filtering out the cities
  • Calculating percentage change in population
  • Building a column for percentage change
  • Sorting on that column in descending order

See the lecture capture for details on how this program is built up. Here’s what we ended up with:

include tables
include gdrive-sheets
include image

include shared-gdrive("cs111-2020.arr", "1imMXJxpNWFCUaawtzIJzPhbDuaLHtuDX")


municipalities = table: name, city, population-2000, population-2010
  row: "Providence", true, 173618, 178042
  row: "Cranston", true, 79269, 80387
  row: "Coventry", false, 33668, 35014
  row: "Warwick", true, 85808, 82672
  row: "North Providence", false, 32411, 32078
end

fun is-town(r :: Row) -> Boolean:
  not(r["city"])
end
fun percent-change(r :: Row) -> Number:
  (r["population-2010"] - r["population-2000"]) /
  r["population-2000"]
end

towns = filter-with(municipalities, is-town)
towns-with-percent-change = build-column(towns, "percent-change", percent-change)
fastest-growing-towns = sort-by(towns-with-percent-change, "percent-change", false)
fastest-growing-towns

Adding columns

We can write functions on rows that don’t return Booleans:

fun percent-change(r :: Row) -> Number:
  (r["population-2010"] - r["population-2000"]) /
  r["population-2000"]
end

We can add columns to tables using another operation: build-column.

build-column :: (t :: Table, colname :: String, builder :: (Row -> A)) -> Table

What’s going on in that signature? In particular, what’s the type of the builder parameter? builder needs to be a function that takes a row and returns…something! The values it returns will be the values in the new column we’re adding to each row. Imagine we have the following table:

a b
“dog” 2
“cat” 3

Let’s call this table t. If we have a function builder, then the value of build-column(t, "c", builder) is

a b c
“dog” 2 builder(<"dog", 2>)
“cat” 3 builder(<"cat", 3>)

For example, if we have:

fun builder(r :: Row) -> Number:
  string-length(row["a"]) + row["b"]
end

Then we end up with the following table:

a b c
“dog” 2 5
“cat” 3 6

Loading Google Sheets into Pyret

Later in this course, we’ll be working with very large tables, which would be a real pain to type into Pyret by hand. I’ve been unwilling to do that even for the 39 municipalities in Rhode Island! Luckily, we don’t have to type tables manually into Pyret–instead, we can load them from Google Sheets!

I’ve made a spreadsheet version of our RI municipality data, which includes every town and city. We can load it into Pyret as follows:

include tables
include gdrive-sheets

include shared-gdrive("cs111-2020.arr", "1imMXJxpNWFCUaawtzIJzPhbDuaLHtuDX")

ssid = "1jHvn5CPE6RkTTQRIXQbY5n5p4aiOH7fZsnwK2s6s6tc"
spreadsheet = load-spreadsheet(ssid)

municipalities = load-table: name :: String, city :: Boolean,
  population-2000 :: Number, population-2010 :: Number
  # true because the sheet has a "header" row
  source: spreadsheet.sheet-by-name("municipalities", true)
end

We can then work with this table just as if we had entered it manually.