Lecture notes: Table functions

Nested functions

Let’s add a “counties” field to the municipalities data we have been using. Here’s how we’ll load the data now:

include tables
include gdrive-sheets

include shared-gdrive("cs111-2018.arr", "1XxbD-eg5BAYuufv6mLmEllyg28IR7HeX")

ssid = "1jHvn5CPE6RkTTQRIXQbY5n5p4aiOH7fZsnwK2s6s6tc"
spreadsheet = load-spreadsheet(ssid)

all-municipalities = load-table: name :: String, city :: Boolean,
  population-2000 :: Number, population-2010 :: Number,
  counties :: String
  # true because the sheet has a "header" row
  source: spreadsheet.sheet-by-name("municipalities-counties", true)
end

Let’s say we want to make a pie chart of the population distribution in a particular county. Here’s how we might do it for Washington County.1

fun in-washington-county(r :: Row) -> Boolean:
  r["county"] == "Washington"
end

fun munis-in-washington-county(munis :: Table) -> Table:
  filter-by(munis, in-washington-conty)
end

# create a pie chart
mwc = munis-in-washington-county(all-municipalities)
pie-chart(mwc, "name", "population-2010")

Now, what if we wanted a similar pie chart for Providence County? We could edit the code and replace “Washington” with “Providence” everywhere, but that’s a little unsatisfying. We can do better by creating a function!

fun munis-in-county(munis :: Table, county :: String) -> Table:
  fun in-county(r :: Row) -> Boolean:
    r["county"] == county
  end
  filter-by(munis, in-county)
end

# create a pie chart
mip = munis-in-county(all-municipalities, "Providence")
pie-chart(mip, "name", "population-2010")

We haven’t seen this before–it’s a function that we’re defining inside another function! Think carefully about how Pyret evaluates a call to munis-in-county. When is in-county defined? What is county at that point?

Lambda expressions

Let’s look at our munis-in-county function. The in-county function it defines is quite boring–it’s really just accessing a particular field of a row. It might be nice if we could write a shorter, equivalent expression.

Pyret lets us do this using lambda expressions (a name that comes from the world of formal descriptions of programming languages). A lambda expression defines an anonymous function–a function that can be passed as an argument, but which does not have an associated name. We can rewrite munis-in-county using a lambda as follows:

fun munis-in-county(munis :: Table, county :: String) -> Table:
  filter-by(munis, lam(r): 
      r["county"] = county
    end)
end

You do not have to use lambda expressions when writing code, but you may find them convenient.

Dealing with multiple tables

So far, we’ve looked at manipulating and analyzing a single table of data. What if we want to combine data from multiple sources? For example, our municipalities spreadsheet has sheet containing just municipality names and land areas (in square kilometers). How can we combine these data with the rest of our municipality data to get population density information?

(In this case, there’s a simple potential solution: in Google Sheets, paste the column of areas into the municipalities table. For more complex cases, though, you’ll need to use the techniques described in this lecture.)

Here’s the code we ended up with:

include tables
include gdrive-sheets

include shared-gdrive("cs111-2018.arr", "1XxbD-eg5BAYuufv6mLmEllyg28IR7HeX")

ssid = "1jHvn5CPE6RkTTQRIXQbY5n5p4aiOH7fZsnwK2s6s6tc"
spreadsheet = load-spreadsheet(ssid)

ri-municipalities = load-table: name :: String, city :: Boolean,
  population-2000 :: Number, population-2010 :: Number
  # true because the sheet has a "header" row
  source: spreadsheet.sheet-by-name("municipalities", true)
end

ri-areas = load-table: name :: String, area :: Number
  source: spreadsheet.sheet-by-name("municipalities-area", true)
end

fun area-for-municipality(areas :: Table, name :: String) -> Number:
  filter-by(areas, lam(r): r["name"] == name end).row-n(0)["area"]
end

fun add-area(municipalities :: Table, areas :: Table) -> Table:
  build-column(municipalities, 
    "area",
    lam(r): area-for-municipality(areas, r["name"]) end)
end

fun density(r :: Row) -> Number:
  r["population-2010"] / r["area"]
end

fun add-density(municipalities :: Table) -> Table:
  build-column(municipalities, "density", density)
end

fun densest-municipalities(municipalities :: Table, areas :: Table) -> Table:
  with-density = add-density(add-area(municipalities, areas))
  sort-by(with-density, "density", false)
end

Footnotes:

1
Before the Revolutionary War, Washington County was called Kings County.