Class summary:   Working With Tables
1 Adding Columns to Tables
2 Testing Row aand Table Functions
2.1 Rows as Inputs
2.2 Tables as Inputs or Outputs
3 Loading Tables from Google Sheets
4 Sanity-Checking Data
5 Cleaning up Peppers with a Table

Class summary: Working With Tables

Copyright (c) 2017 Kathi Fisler

1 Adding Columns to Tables

We started with an example of computing a new column for a table. Here is code for adding a column containing the average of two exam scores.

  fun exam-avg(r :: Row) -> Number:

    doc: "compute average of the two exams in the row"

    (r["exam1"] + r["exam2"]) / 2

  end

  

  build-column(gradebook, "avg", exam-avg)

What if I wanted to sort the gradebook by averages? Can we do?

  sort-by(gradebook, "avg", true)

This yield an error because gradebook doesn’t have an "avg" column. Remember that none of the table operators actually modify the original table. We therefore need to combine the build-column and sort-by expressions. There are two ways we can do this:

  # name the first table, then use that table in the sort

  gradebook1 = build-column(gradebook, "avg", exam-avg)

  sort-by(gradebook1, "avg", true)

  

  # feed the result of build-column directly into sort-by

  sort-by(

    build-column(gradebook, "avg", exam-avg),

    "avg", true)

Both approaches compute the same answer. The first just names the intermediate table in the known-names area. That is useful if you need to use the same table multiple times, but it can be overkill if you don’t need the table after the next expression.

2 Testing Row aand Table Functions

2.1 Rows as Inputs

Up to now, we’ve not been writing where blocks while we got a sense of the table operations. Now let’s go back and add those tests. Let’s start with testing a function that takes a row as input. We’ll use exam-avg:

  fun exam-avg(r :: Row) -> Number:

    doc: "compute average of the two exams in the row"

    (r["exam1"] + r["exam2"]) / 2

  where:

    exam-avg(gradebook.row-n(4)) is 35

    exam-avg(gradebook.row("A", true, 100, 80)) is 90

  end

This code shows 2 ways to get an input row for testing a function: we can pull a row out of an existing table (the first test) or we can create a temporary row just for testing (the second test). The second version uses the row operation to basically get the column names from the gradebook, then makes a row to match those column names.

Each approach is useful in certain situations. Use whichever you feel is appropriate to your problem.

2.2 Tables as Inputs or Outputs

What if we were testing a function that takes a table and filters out some of the rows. What would our tests need to look like? Ideally, you should test scenarios that yield different output tables, including an empty table, the entire table, and some of the table. Here’s what such a where block could look like:

  gradebook-all-did-worse = table: name, SNC, exam1, exam2

    row: "Alina", false, 85, 80

    row: "Carl", false, 75, 60

    row: "Elan", true, 95, 63

  end

  

  gradebook-all-improved = table: name, SNC, exam1, exam2

    row: "Alina", false, 85, 88

    row: "Carl", false, 75, 76

    row: "Elan", true, 95, 100

  end

  

  # creates a gradebook with rows 1, 2, and 4 from original gradebook

  gradebook-exam2-lower =

    gradebook.empty()

    .add-row(gradebook.row-n(1))

    .add-row(gradebook.row-n(2))

    .add-row(gradebook.row-n(4))

  

  fun get-exam2-lower(t :: Table) -> Table:

    doc: "return table of the rows where exam2 lower than exam1"

    filter-by(t, exam2-lower)

  where:

    # get-exam2-lower(gradebook) is filter-by(gradebook, exam2-lower)

    get-exam2-lower(gradebook) is gradebook-exam2-lower

    get-exam2-lower(gradebook-all-improved) is gradebook.empty()

    get-exam2-lower(gradebook-all-did-worse) is gradebook-all-did-worse

  end

These examples show some new functions on tables:

3 Loading Tables from Google Sheets

Go back to the gradebook from last class:

  gradebook = table: name, SNC, exam1, exam2

    row: "Alina", false, 85, 90

    row: "Carl", false, 75, 60

    row: "Elan", true, 95, 63

    row: "Lavon", false, 87, 88

    row: "Nunu", true, 70, 0

  end

This is a smaller gradebook that is good for experimenting with and testing programs. A realistic gradebook might be maintained in a spreadsheet. Here’s an example of how to load a gradebook from a Google Sheet:

  include gdrive-sheets

  

  imported-grades =

    load-spreadsheet("1q7kWdnIYrNthKJ2W4vkh51wp4earjNJB2r9CVjkO-u4")

  

  gradebook-large =

    load-table: name :: String, SNC :: Boolean, exam1 :: Number, exam2 :: Number

      source: imported-grades.sheet-by-name("Sheet1", true)

    end

The load-spreadsheet function is used to tell Pyret that you want to use data from a Google Sheet. The string passed as an argument is the Google Document ID (the sequence of numbers and letters in the document URL).

One more step is needed to get the google sheet data loaded into a table. For that, we use the load-table command. The first line names the columns that will be brought in from the sheet. The second line (source) says to load the sheet named "Sheet1" from imported-grades. The true says that the sheet has a header row (that will not be loaded).

4 Sanity-Checking Data

We took the gradebook-large table and sorted it on each of the exam scores to see who might need extra help in the course. To our surprise, we found a very odd grade (9372) for one of the students on exam1. Likely, the person entering the exam grades concatenated the two grades, rather than put 93 for exam1 and 72 for exam2.

The moral here is that you should never trust large data sets without trying to sanity check them first. One good way to sanity-check a data set is to create plots or charts of it. We saw two examples:

  scatter-plot(gradebook-large, "exam1", "exam2")

  histogram(gradebook-large, "exam1", 10)

    # 10 is the reporting interval or bin size

Pyret supports other forms of charts and plots, as noted in the tables documentation.

If you do find a problem in your data (missing data, erroneous data), you can either repair it in your source file, or write a program to correct it. For the gradebook situation, fixing it in the source file makes sense. You’ll see a different situation on the homework.

5 Cleaning up Peppers with a Table

Finally, we returned to our pepper program and looked to replace the sequence of else if statements with a table the containing the low and high bounds for various kinds of peppers. Here’s the table:

  peppers = table: name, low, high

    row: "bell pepper", 0, 0

    row: "paprika", 100, 1000

    row: "jalapeno", 3500, 10000

    row: "habanero", 100000, 3500000

  end

How do we rewrite the pepper-scale function to use the table? When faced with a problem like this, start by writing down what tasks need to get done as part of the computation. To get the name of a pepper with a particular rating, we will need to:

Now, think about how to do each step. The first looks like a filter-by.

  fun pepper-scale3(rating :: Number) -> String:

    doc: "determine name of pepper given scoville rating"

  

    fun rating-in-row(r :: Row) -> Boolean:

      doc: "determine if rating lies within low/high of row"

      is-between(r["low"], r["high"], rating)

    end

  

    # not yet finished -- just putting in the first step

    filter-by(peppers, rating-in-row)

  where:

    pepper-scale3(0) is "bell pepper"

    pepper-scale3(250) is "paprika"

    pepper-scale3(240000) is "habanero"

    pepper-scale3(80) is "unknown"

    pepper-scale3(-150) is "unknown"

  end

Now, build on the filter-by to extract the first row (since only one row should match a given rating)

  filter-by(peppers, rating-in-row).row-n(0)

Then, the third step of extracting the name:

  filter-by(peppers, rating-in-row).row-n(0)["name"]

Now our code looks like:

  fun pepper-scale3(rating :: Number) -> String:

    doc: "determine name of pepper given scoville rating"

  

    fun rating-in-row(r :: Row) -> Boolean:

      doc: "determine if rating lies within low/high of row"

      is-between(r["low"], r["high"], rating)

    end

  

    filter-by(peppers, rating-in-row).row-n(0)["name"]

  

  where:

    pepper-scale3(0) is "bell pepper"

    pepper-scale3(250) is "paprika"

    pepper-scale3(240000) is "habanero"

    pepper-scale3(80) is "unknown"

    pepper-scale3(-150) is "unknown"

  end

Running this however yields an error for the unknown cases – if a rating isn’t covered in the table, the filter-by returns an empty table, so the row-n function fails. We have to update our code one last time to check whether the filter result has any rows, returning "unknown" if it does not. We use the length function to dig into a table and get the number of rows.

  fun pepper-scale3(rating :: Number) -> String:

    doc: "determine name of pepper given scoville rating"

  

    fun rating-in-row(r :: Row) -> Boolean:

      doc: "determine if rating lies within low/high of row"

      is-between(r["low"], r["high"], rating)

    end

  

    find-pepper = filter-by(peppers, rating-in-row)

    if find-pepper.length() > 0:

      find-pepper.row-n(0)["name"]

    else:

      "unknown"

    end

  

  where:

    pepper-scale3(0) is "bell pepper"

    pepper-scale3(250) is "paprika"

    pepper-scale3(240000) is "habanero"

    pepper-scale3(80) is "unknown"

    pepper-scale3(-150) is "unknown"

  end

This example pulled together many things we’ve worked on the last few classes. We have another case of putting one function inside another so that the second can use inputs to the first, we use if expressions to do different computations in different situations, and we use tables to organize our data.

This is a slightly different use of tables than in gradebook. In gradebook, we had an external dataset that we wanted to process with programs. In peppers, our computation depending on information that could be managed cleanly through a table (adding a new kind of pepper just needs us to edit the table now, not the program as with our original pepper program). Both use the same table functions, it’s just that the table arises from different situations.