Class summary: Introduction to Tables

The section in PAPL on tabular data uses a different notation for table operations than we present in this lecture, so reading that section may well be more confusing than helpful.

1 Tabular Data

When you have a collection of data that report the same attributes about a group of entities, a data table (or just table) can be a good way to organize the data.

Start with a gradebook table:

include shared-gdrive("table-functions.arr", "14jG4wvAMhjJue1-EmY-u9hX4UwmPHCO8")

include tables

gradebook = table: name, SNC, exam1, exam2

row: "Alina", false, 85, 90

row: "Carl", false, 75, 60

row: "Elan", true, 95, 63

row: "Lavon", false, 87, 88

row: "Nunu", true, 70, 0

end

(Note: At Brown, "SNC" means "pass/fail" rather than taking a class for a letter grade)

What computations might you want to do with this table?

compute course grades
get histogram of performance on each exam
look at delta from first exam to second exam
check whether SNC or letter-grade students did better on exam2
get names of students who did poorly on the first exam
etc

To do these analyses, need to be able to do operations on tables. What sort of operations do you need?

filter out some rows (to look at only low grades)
re-order the rows (to see high or low scores first)
perform computations based on particular some columns (i.e., SNC and exam2)
add a column with the overall course grade

Our first task was to learn how to do these computations in Pyret. Here are examples of these computations:

#----------------------------------------------

# order the rows by descending values on exam1

sort-by(gradebook, "exam1", false)

#----------------------------------------------

# keep only those rows in which the SNC column contains true

fun taking-snc(r :: Row) -> Boolean:

r["SNC"]

end

filter-by(gradebook, taking-snc)

#----------------------------------------------

# keep only those rows in which the SNC column contains false

fun not-taking-snc(r :: Row) -> Boolean:

not(r["SNC"])

end

filter-by(gradebook, not-taking-snc)

#----------------------------------------------

# keep those students whose grades dropped from exam1 to exam2

fun exam2-lower(r :: Row) -> Boolean:

r["exam1"] > r["exam2"]

end

filter-by(gradebook, exam2-lower)

#----------------------------------------------

# add a column with the average of the exam grades

fun exam-avg(r :: Row) -> Number:

(r["exam1"] + r["exam2"]) / 2

end

build-column(gradebook, "avg", exam-avg)

#----------------------------------------------

# sort by exam averages

sort-by(

build-column(gradebook, "avg", exam-avg),

"avg", false)

1.1 Summary of Table Operations

As a summary, here are the functions shown in these examples (you won’t find them in the Pyret documentation, as they have only recently been added to the language):

filter-by :: (t :: Table, (test :: Row->Boolean)) -> Table

sort-by :: (t :: Table, col :: String, ascending :: Boolean) -> Table

build-column :: (T :: Table, col :: String,

builder :: (Row -> Value)) -> Table

the notation row["colname"] extracts the value stored in the named column in the given row

What are the key takeaways from this segment?

Key idea in CS: Once data are made up of smaller pieces of data, we want to organize the data to make it easier to maintain and process. Tables are good for data about multiple entities, each of which has the same attributes.

Key idea in CS: Tables, just like images and numbers, have operations that let you manipulate. combine, and compute over them. As with computations on numbers strings, or images, you should reflect the structure of the computation in your code.