Lecture notes: More tables

Here’s our table from the last lecture:

include tables

include shared-gdrive("cs111-2018.arr", "1XxbD-eg5BAYuufv6mLmEllyg28IR7HeX")

municipalities = table: name, city, population-2000, population-2010
  row: "Providence", true, 173618, 178042
  row: "Cranston", true, 79269, 80387
  row: "Coventry", false, 33668, 35014
  row: "Warwick", true, 85808, 82672,
  row: "North Providence", false, 32411, 32078
end

Review of table operations

Let’s first review the operations we saw last time.

We can get the nth row of the data:

> municipalities.row-n(0)
> municipalities.row-n(1)
> municipalities.row-n(0)["name"]

We can sort the data by 2010 population:

> sort-by(municipalities, "population-2010", false)

To find the least populous municipalities, we can change the last argument:

> sort-by(municipalities, "population-2010", true)

We can also find all of the cities:

fun is-city(r :: Row) -> Boolean:
  doc: "get the value in the row's city column"
  r["city"]
end

> filter-by(municipalities, is-city)

Note what we’re doing here: we’re passing a function to another function. How is is-city used?

We can find municipalities whose population went down:

fun population-decreased(r :: Row) -> Boolean:
  doc: "returns true if the municipality's population went down between '00 and '10"
  r["population-2010"] < r["population-2000"]
end

> filter-by(municipalities, population-decreased)

And, we can combine all of these operations. How would we get the city with the smallest population?

> sort-by(filter-by(municipalities, is-city), "population-2010", true).row-n(0)

Adding columns

We can add columns to tables using another operation: build-column.

build-column :: (t :: Table, colname :: String, builder :: (Row -> A)) -> Table

What’s going on in that signature? In particular, what’s the type of the builder parameter? builder needs to be a function that takes a row and returns…something! The values it returns will be the values in the new column we’re adding to each row. Imagine we have the following table:

a	b
“dog”	2
“cat”	3

Let’s call this table t. If we have a function builder, then the value of build-column(t, "c", builder) is

a	b	c
“dog”	2	`builder(<"dog", 2>)`
“cat”	3	`builder(<"cat", 3>)`

For example, if we have:

fun builder(r :: Row) -> Number:
  string-length(row["a"]) + row["b"]
end

Then we end up with the following table:

a	b	c
“dog”	2	5
“cat”	3	6

Population change

Let’s work on solving a problem using our RI municipalities data. Let’s say we’re trying to figure out what the fastest-growing towns are in RI. In other words, we want a table containing only towns, sorted by the percent change in population. How would we go about doing this? What are the subtasks of this problem?

Filtering out the cities
Calculating percentage change in population
Building a column for percentage change
Sorting on that column in descending order

See the lecture capture for details on how this program is built up. Here’s what we ended up with:

include tables
include gdrive-sheets
include image

include shared-gdrive("cs111-2018.arr", "1XxbD-eg5BAYuufv6mLmEllyg28IR7HeX")

municipalities = table: name, city, population-2000, population-2010
  row: "Providence", true, 173618, 178042
  row: "Cranston", true, 79269, 80387
  row: "Coventry", false, 33668, 35014
  row: "Warwick", true, 85808, 82672
  row: "North Providence", false, 32411, 32078
end

fun is-town(r :: Row) -> Boolean:
  not(r["city"])
end
fun percent-change(r :: Row) -> Number:
  (r["population-2010"] - r["population-2000"]) /
  r["population-2000"]
end

towns = filter-by(municipalities, is-town)
towns-with-percent-change = build-column(towns, "percent-change", percent-change)
fastest-growing-towns = sort-by(towns-with-percent-change, "percent-change", false)
fastest-growing-towns

Loading Google Sheets into Pyret

Later in this course, we’ll be working with very large tables, which would be a real pain to type into Pyret by hand. I’ve been unwilling to do that even for the 39 municipalities in Rhode Island! Luckily, we don’t have to type tables manually into Pyret–instead, we can load them from Google Sheets!

I’ve made a spreadsheet version of our RI municipality data, which includes every town and city. We can load it into Pyret as follows:

include tables
include gdrive-sheets

include shared-gdrive("cs111-2018.arr", "1XxbD-eg5BAYuufv6mLmEllyg28IR7HeX")

ssid = "1jHvn5CPE6RkTTQRIXQbY5n5p4aiOH7fZsnwK2s6s6tc"
spreadsheet = load-spreadsheet(ssid)

municipalities = load-table: name :: String, city :: Boolean,
  population-2000 :: Number, population-2010 :: Number
  # true because the sheet has a "header" row
  source: spreadsheet.sheet-by-name("municipalities", true)
end

We can then work with this table just as if we had entered it manually.