Class summary:   Table Organization
1 A Recap on Nested Functions
1.1 Anonymous Functions (lambda)
2 Organizing Tables to Tasks
2.1 Takeaways

Class summary: Table Organization

Copyright (c) 2017 Kathi Fisler

1 A Recap on Nested Functions

There were some lingering questions from the last lecture about nested functions, especially for use with table functions. We started with a review, using the following example.

  fun filter-by-discount(t :: Table, d :: String) -> Table:

    doc: "filter table to rows with given discount"

    fun has-discount(r :: Row) -> Boolean:

      r["discount"] == d

    end

    filter-by(t, has-discount)

  end

  

  student-tickets =

     sum(filter-by-discount(event-data, "student"),

         "tickcount")

We went through a justification of why has-discount needs to be nested within filter-by-discount. We also reviewed how the call to filter-by-discount works, showing how d gets its value, then how has-discount gets used on the table rows.

Lecture capture shows both explanations. We have also summarized them in an animated Powerpoint document [PDF version].

1.1 Anonymous Functions (lambda)

If you are getting comfortable writing filter-by and build-column expressions, you may be getting tired of having to write out the nested functions all the time. If you are ready for a shorthand, this section is for you (if you aren’t ready for this yet, that’s also fine).

Notice how in filter-by-discount the function has-discount is somewhat temporary – we create it just so we can give it as the argument to filter-by. We aren’t planning to use the name again writing the filter-by call.

For situations such as these, Pyret provides the ability to define anonymous functions – functions with arguments and bodies, but no names. Here’s how the same code appears written with an anonymous function instead:

  fun filter-by-discount(t :: Table, d :: String) -> Table:

    doc: "filter table to rows with given discount"

    filter-by(t, lam(r): r["discount"] == d end)

  end

lam (short for the greek letter lambda, which is used for functions in the mathematical foundations of programming languages) says "make an anonymous function". The function still takes the same parameters, and still has the same body, as well as the end marker. But the name has been stripped off.

We have also stripped off the types (though we could have included them). Why? Because we tend to use anonymous functions in very localized situations like this where the types are easy to see from usage context (i.e., we know that filter-by function arguments take a Row and return a Boolean.

You are welcome to use lam functions or not, as you see fit. If you’re comfortable with them, they do make your code tighter. But it’s also fine to continue to use named functions, as the idea of anonymous functions can be a bit abstract at first.

2 Organizing Tables to Tasks

We began by looking at a Google Sheet with data bout the regional demographics of recent Brown entering classes. This data comes from Brown Admissions Factbook, First-Year Class Geographic Profile (the overall factbook has a lot of data on Brown’s student and employee populations).

Here are some questions we might want to ask about this data?

We asked about the types of charts and plots made the most sense for each of these three questions. We also asked what organization of the data would be best suited to generating each of those charts. As a reminder, the available charts are listed in the tables-functions documentation page.

We generally concluded that

2.1 Takeaways

Key idea in CS: Being able to choose a table organization based on your analysis questions is a key skill in data science. Identifying manipulations that would properly reformat a table is a skill from computer science. Actually using code to reformat a table from one organization to another is a skill from programming.

In short, the course is starting to move from learning mechanics to really bringing our topics together to work on problems that arise in real analysis contexts. Let the fun begin!