Lecture notes: Table functions
Nested functions
Let’s add a “counties” field to the municipalities data we have been using. Here’s how we’ll load the data now:
include tables include gdrive-sheets include shared-gdrive("cs111-2018.arr", "1XxbD-eg5BAYuufv6mLmEllyg28IR7HeX") ssid = "1jHvn5CPE6RkTTQRIXQbY5n5p4aiOH7fZsnwK2s6s6tc" spreadsheet = load-spreadsheet(ssid) all-municipalities = load-table: name :: String, city :: Boolean, population-2000 :: Number, population-2010 :: Number, counties :: String # true because the sheet has a "header" row source: spreadsheet.sheet-by-name("municipalities-counties", true) end
Let’s say we want to make a pie chart of the population distribution in a particular county. Here’s how we might do it for Washington County.1
fun in-washington-county(r :: Row) -> Boolean: r["county"] == "Washington" end fun munis-in-washington-county(munis :: Table) -> Table: filter-by(munis, in-washington-conty) end # create a pie chart mwc = munis-in-washington-county(all-municipalities) pie-chart(mwc, "name", "population-2010")
Now, what if we wanted a similar pie chart for Providence County? We could edit the code and replace “Washington” with “Providence” everywhere, but that’s a little unsatisfying. We can do better by creating a function!
fun munis-in-county(munis :: Table, county :: String) -> Table: fun in-county(r :: Row) -> Boolean: r["county"] == county end filter-by(munis, in-county) end # create a pie chart mip = munis-in-county(all-municipalities, "Providence") pie-chart(mip, "name", "population-2010")
We haven’t seen this before–it’s a function that we’re defining inside another
function! Think carefully about how Pyret evaluates a call to
munis-in-county
. When is in-county
defined? What is county
at that point?
Lambda expressions
Let’s look at our munis-in-county
function. The in-county
function it
defines is quite boring–it’s really just accessing a particular field of a
row. It might be nice if we could write a shorter, equivalent expression.
Pyret lets us do this using lambda expressions (a name that comes from the
world of formal descriptions of programming languages). A lambda expression
defines an anonymous function–a function that can be passed as an argument,
but which does not have an associated name. We can rewrite munis-in-county
using a lambda as follows:
fun munis-in-county(munis :: Table, county :: String) -> Table: filter-by(munis, lam(r): r["county"] = county end) end
You do not have to use lambda expressions when writing code, but you may find them convenient.
Dealing with multiple tables
So far, we’ve looked at manipulating and analyzing a single table of data. What if we want to combine data from multiple sources? For example, our municipalities spreadsheet has sheet containing just municipality names and land areas (in square kilometers). How can we combine these data with the rest of our municipality data to get population density information?
(In this case, there’s a simple potential solution: in Google Sheets, paste the column of areas into the municipalities table. For more complex cases, though, you’ll need to use the techniques described in this lecture.)
Here’s the code we ended up with:
include tables include gdrive-sheets include shared-gdrive("cs111-2018.arr", "1XxbD-eg5BAYuufv6mLmEllyg28IR7HeX") ssid = "1jHvn5CPE6RkTTQRIXQbY5n5p4aiOH7fZsnwK2s6s6tc" spreadsheet = load-spreadsheet(ssid) ri-municipalities = load-table: name :: String, city :: Boolean, population-2000 :: Number, population-2010 :: Number # true because the sheet has a "header" row source: spreadsheet.sheet-by-name("municipalities", true) end ri-areas = load-table: name :: String, area :: Number source: spreadsheet.sheet-by-name("municipalities-area", true) end fun area-for-municipality(areas :: Table, name :: String) -> Number: filter-by(areas, lam(r): r["name"] == name end).row-n(0)["area"] end fun add-area(municipalities :: Table, areas :: Table) -> Table: build-column(municipalities, "area", lam(r): area-for-municipality(areas, r["name"]) end) end fun density(r :: Row) -> Number: r["population-2010"] / r["area"] end fun add-density(municipalities :: Table) -> Table: build-column(municipalities, "density", density) end fun densest-municipalities(municipalities :: Table, areas :: Table) -> Table: with-density = add-density(add-area(municipalities, areas)) sort-by(with-density, "density", false) end