Plotting data, nested functions

Plotting data

Plots help us visually understand the shape of data, and are often much more readable than a large table of numbers. Data scientists use plots for both exploratory and explanatory purposes–they are useful for understanding data in preparation for further analysis and in presenting data to a general audience.

Our tables library includes several functions to generate different kinds of plot. Here are a few examples using our municipalities data.

# how is population distributed in the state?
pie-chart(all-municipalities, "name", "population-2010")

# how many municipalities of various sizes are there?
histogram(all-municipalities, "population-2010", 1000)

# hw much, and how, does population vary?
box-plot(all-municipalities, "population-2010")

ft = fastest-growing-towns(all-municipalities)

# visually present the growth data
bar-chart(ft, "name", "population-2010")

# is a town's size (in 2000) correlated with its growth?
scatter-plot(ft, "population-2000", "percent-change")
# linear regression
lr-plot(ft, "population-2000", "percent-change")

Nested functions

Let’s add a “county” field to the municipalities data we have been using. Here’s how we’ll load the data now:

include tables
include gdrive-sheets

include shared-gdrive("cs111-2020.arr", "1imMXJxpNWFCUaawtzIJzPhbDuaLHtuDX")

ssid = "1jHvn5CPE6RkTTQRIXQbY5n5p4aiOH7fZsnwK2s6s6tc"
spreadsheet = load-spreadsheet(ssid)

all-municipalities = load-table: name :: String, city :: Boolean,
  population-2000 :: Number, population-2010 :: Number,
  county :: String
  # true because the sheet has a "header" row
  source: spreadsheet.sheet-by-name("municipalities-counties", true)
end

Let’s say we want to make a pie chart of the population distribution in a particular county. Here’s how we might do it for Washington County.¹

fun in-washington-county(r :: Row) -> Boolean:
  r["county"] == "Washington"
end

fun munis-in-washington-county(munis :: Table) -> Table:
  filter-with(munis, in-washington-conty)
end

# create a pie chart
mwc = munis-in-washington-county(all-municipalities)
pie-chart(mwc, "name", "population-2010")

Now, what if we wanted a similar pie chart for Providence County? We could edit the code and replace “Washington” with “Providence” everywhere, but that’s a little unsatisfying. We can do better by creating a function!

fun munis-in-county(munis :: Table, county :: String) -> Table:
  fun in-county(r :: Row) -> Boolean:
    r["county"] == county
  end
  filter-with(munis, in-county)
end

# create a pie chart
mip = munis-in-county(all-municipalities, "Providence")
pie-chart(mip, "name", "population-2010")

We haven’t seen this before–it’s a function that we’re defining inside another function! Think carefully about how Pyret evaluates a call to munis-in-county. When is in-county defined? What is county at that point?

Lambda expressions

Let’s look at our munis-in-county function. The in-county function it defines is quite boring–it’s really just accessing a particular field of a row. It might be nice if we could write a shorter, equivalent expression.

Pyret lets us do this using lambda expressions (a name that comes from the world of formal descriptions of programming languages). A lambda expression defines an anonymous function–a function that can be passed as an argument, but which does not have an associated name. We can rewrite munis-in-county using a lambda as follows:

fun munis-in-county(munis :: Table, county :: String) -> Table:
  filter-with(munis, lam(r): 
      r["county"] = county
    end)
end

You do not have to use lambda expressions when writing code, but you may find them convenient.

Footnotes:

Before the Revolutionary War, Washington County was called Kings County.