Plotting data, nested functions
Plotting data
Plots help us visually understand the shape of data, and are often much more readable than a large table of numbers. Data scientists use plots for both exploratory and explanatory purposes–they are useful for understanding data in preparation for further analysis and in presenting data to a general audience.
Our tables library includes several functions to generate different kinds of plot. Here are a few examples using our municipalities data.
# how is population distributed in the state? pie-chart(all-municipalities, "name", "population-2010") # how many municipalities of various sizes are there? histogram(all-municipalities, "population-2010", 1000) # hw much, and how, does population vary? box-plot(all-municipalities, "population-2010") ft = fastest-growing-towns(all-municipalities) # visually present the growth data bar-chart(ft, "name", "population-2010") # is a town's size (in 2000) correlated with its growth? scatter-plot(ft, "population-2000", "percent-change") # linear regression lr-plot(ft, "population-2000", "percent-change")
Nested functions
Let’s add a “county” field to the municipalities data we have been using. Here’s how we’ll load the data now:
include tables include gdrive-sheets include shared-gdrive("cs111-2020.arr", "1imMXJxpNWFCUaawtzIJzPhbDuaLHtuDX") ssid = "1jHvn5CPE6RkTTQRIXQbY5n5p4aiOH7fZsnwK2s6s6tc" spreadsheet = load-spreadsheet(ssid) all-municipalities = load-table: name :: String, city :: Boolean, population-2000 :: Number, population-2010 :: Number, county :: String # true because the sheet has a "header" row source: spreadsheet.sheet-by-name("municipalities-counties", true) end
Let’s say we want to make a pie chart of the population distribution in a particular county. Here’s how we might do it for Washington County.1
fun in-washington-county(r :: Row) -> Boolean: r["county"] == "Washington" end fun munis-in-washington-county(munis :: Table) -> Table: filter-with(munis, in-washington-conty) end # create a pie chart mwc = munis-in-washington-county(all-municipalities) pie-chart(mwc, "name", "population-2010")
Now, what if we wanted a similar pie chart for Providence County? We could edit the code and replace “Washington” with “Providence” everywhere, but that’s a little unsatisfying. We can do better by creating a function!
fun munis-in-county(munis :: Table, county :: String) -> Table: fun in-county(r :: Row) -> Boolean: r["county"] == county end filter-with(munis, in-county) end # create a pie chart mip = munis-in-county(all-municipalities, "Providence") pie-chart(mip, "name", "population-2010")
We haven’t seen this before–it’s a function that we’re defining inside another
function! Think carefully about how Pyret evaluates a call to
munis-in-county
. When is in-county
defined? What is county
at that point?
Lambda expressions
Let’s look at our munis-in-county
function. The in-county
function it
defines is quite boring–it’s really just accessing a particular field of a
row. It might be nice if we could write a shorter, equivalent expression.
Pyret lets us do this using lambda expressions (a name that comes from the
world of formal descriptions of programming languages). A lambda expression
defines an anonymous function–a function that can be passed as an argument,
but which does not have an associated name. We can rewrite munis-in-county
using a lambda as follows:
fun munis-in-county(munis :: Table, county :: String) -> Table: filter-with(munis, lam(r): r["county"] = county end) end
You do not have to use lambda expressions when writing code, but you may find them convenient.
Footnotes:
Before the Revolutionary War, Washington County was called Kings County.