Class summary:   Introduction to Tables
1 Tabular Data
1.1 Summary of Table Operations

Class summary: Introduction to Tables

Copyright (c) 2017 Kathi Fisler

The section in PAPL on tabular data uses a different notation for table operations than we present in this lecture, so reading that section may well be more confusing than helpful.

1 Tabular Data

When you have a collection of data that report the same attributes about a group of entities, a data table (or just table) can be a good way to organize the data.

Start with a gradebook table:

  include shared-gdrive("table-functions.arr", "14jG4wvAMhjJue1-EmY-u9hX4UwmPHCO8")

  include tables

  

  gradebook = table: name, SNC, exam1, exam2

    row: "Alina", false, 85, 90

    row: "Carl",  false, 75, 60

    row: "Elan", true, 95, 63

    row: "Lavon", false, 87, 88

    row: "Nunu", true, 70, 0

  end

(Note: At Brown, "SNC" means "pass/fail" rather than taking a class for a letter grade)

What computations might you want to do with this table?

  • compute course grades

  • get histogram of performance on each exam

  • look at delta from first exam to second exam

  • check whether SNC or letter-grade students did better on exam2

  • get names of students who did poorly on the first exam

  • etc

To do these analyses, need to be able to do operations on tables. What sort of operations do you need?

  • filter out some rows (to look at only low grades)

  • re-order the rows (to see high or low scores first)

  • perform computations based on particular some columns (i.e., SNC and exam2)

  • add a column with the overall course grade

Our first task was to learn how to do these computations in Pyret. Here are examples of these computations:

  #----------------------------------------------

  # order the rows by descending values on exam1

  sort-by(gradebook, "exam1", false)

  

  #----------------------------------------------

  # keep only those rows in which the SNC column contains true

  fun taking-snc(r :: Row) -> Boolean:

    r["SNC"]

  end

  

  filter-by(gradebook, taking-snc)

  

  #----------------------------------------------

  # keep only those rows in which the SNC column contains false

  fun not-taking-snc(r :: Row) -> Boolean:

    not(r["SNC"])

  end

  

  filter-by(gradebook, not-taking-snc)

  

  #----------------------------------------------

  # keep those students whose grades dropped from exam1 to exam2

  fun exam2-lower(r :: Row) -> Boolean:

    r["exam1"] > r["exam2"]

  end

  

  filter-by(gradebook, exam2-lower)

  

  #----------------------------------------------

  # add a column with the average of the exam grades

  fun exam-avg(r :: Row) -> Number:

    (r["exam1"] + r["exam2"]) / 2

  end

  

  build-column(gradebook, "avg", exam-avg)

  

  #----------------------------------------------

  # sort by exam averages

  sort-by(

    build-column(gradebook, "avg", exam-avg),

    "avg", false)

1.1 Summary of Table Operations

As a summary, here are the functions shown in these examples (you won’t find them in the Pyret documentation, as they have only recently been added to the language):

  filter-by :: (t :: Table, (test :: Row->Boolean)) -> Table

  sort-by :: (t :: Table, col :: String, ascending :: Boolean) -> Table

  build-column :: (T :: Table, col :: String,

                   builder :: (Row -> Value)) -> Table

the notation row["colname"] extracts the value stored in the named column in the given row

What are the key takeaways from this segment?

Key idea in CS: Once data are made up of smaller pieces of data, we want to organize the data to make it easier to maintain and process. Tables are good for data about multiple entities, each of which has the same attributes.

Key idea in CS: Tables, just like images and numbers, have operations that let you manipulate. combine, and compute over them. As with computations on numbers strings, or images, you should reflect the structure of the computation in your code.