Class summary: Introduction to Tables
Copyright (c) 2017 Kathi Fisler
The section in PAPL on tabular data uses a different notation for table operations than we present in this lecture, so reading that section may well be more confusing than helpful.
Try a new problem today: managing a gradebook. I want to write programs to help me do various computations over students’ grades on exams (average scores on an exam, average scores across exams, improvement, histograms of scores, etc). Let’s start with two students:
Allie got 85 on exam1 and 90 on exam2 Carl got 75 on exam1 and 60 on exam2
Let’s say I want to have a function that lets me lookup the grade that a specific student got on a specific exam. How might we write that code with what we have now?
fun lookup-grade(student :: String, asgn :: String) -> Number: |
doc: "Return grade of a given student on a given exam" |
if (student == "Allie"): |
if (asgn == "exam1"): |
85 |
else if (asgn == "exam2"): |
90 |
else: |
raise("No such assignment") |
end |
else if (student == "Carl"): |
if (asgn == "exam1"): |
75 |
else if (asgn == "exam2"): |
60 |
else: |
raise("No such assignment") |
end |
else: raise("No such student") |
end |
end |
(the raise operator is how we have Pyret programs generate errors in the case of unexpected or invalid inputs.)
This code should bother us for several reasons:
It won’t scale to many assignments or large numbers of students.
This function only looks up grades – if we want other gradebook operations (like averaging grades), will we have to repeat all of this data in another function?
The code treats the pieces of data about an individual student as separate pieces of information, rather than as a collection of related information.
Key idea in CS: As a general rule of thumb, we want ways to keep related data together. We also want to keep data separate from computations that process the data (so we can reuse the data).
As we discussed this setup, some students raised the idea of spreadsheets. Spreadsheets are good for grouping data into rows and columns – with a spreadsheet, we could keep all of a student’s data together in a row, while keeping all data about a particular assignment together in a column. If only we had a spreadsheet!
Actually, Pyret has a data type that gives us the structure of a spreadsheet (but without all the built-in formulas). It’s called a Table.
1 Tabular Data
When you have a collection of data that report the same attributes about a group of entities, a data table (or just table) can be a good way to organize the data.
Start with a gradebook table:
include shared-gdrive("cs111-2018.arr", "1XxbD-eg5BAYuufv6mLmEllyg28IR7HeX") |
include tables |
|
gradebook = table: name, SNC, exam1, exam2 |
row: "Alina", false, 85, 90 |
row: "Carl", false, 75, 60 |
row: "Elan", true, 95, 63 |
row: "Lavon", false, 87, 88 |
row: "Nunu", true, 70, 0 |
end |
(Note: At Brown, "SNC" means "pass/fail" rather than taking a class for a letter grade)
What computations might you want to do with this table?
compute course grades
get histogram of performance on each exam
look at delta from first exam to second exam
check whether SNC or letter-grade students did better on exam2
get names of students who did poorly on the first exam
etc
To do these analyses, need to be able to do operations on tables. What sort of operations do you need?
filter out some rows (to look at only low grades)
re-order the rows (to see high or low scores first)
perform computations based on particular some columns (i.e., SNC and exam2)
add a column with the overall course grade
Our first task was to learn how to do these computations in Pyret. Here are examples of these computations:
#---------------------------------------------- |
# order the rows by descending values on exam1 |
sort-by(gradebook, "exam1", false) |
|
#---------------------------------------------- |
# keep only those rows in which the SNC column contains true |
fun taking-snc(r :: Row) -> Boolean: |
r["SNC"] |
end |
|
filter-by(gradebook, taking-snc) |
|
#---------------------------------------------- |
# keep only those rows in which the SNC column contains false |
fun not-taking-snc(r :: Row) -> Boolean: |
not(r["SNC"]) |
end |
|
filter-by(gradebook, not-taking-snc) |
|
#---------------------------------------------- |
# keep those students whose grades dropped from exam1 to exam2 |
fun exam2-lower(r :: Row) -> Boolean: |
r["exam1"] > r["exam2"] |
end |
|
filter-by(gradebook, exam2-lower) |
|
#---------------------------------------------- |
# add a column with the average of the exam grades |
fun exam-avg(r :: Row) -> Number: |
(r["exam1"] + r["exam2"]) / 2 |
end |
|
build-column(gradebook, "avg", exam-avg) |
|
#---------------------------------------------- |
# sort by exam averages |
sort-by( |
build-column(gradebook, "avg", exam-avg), |
"avg", false) |
1.1 Summary of Table Operations
As a summary, here are the functions shown in these examples (you won’t find them in the Pyret documentation, as they have only recently been added to the language):
filter-by :: (t :: Table, (test :: Row->Boolean)) -> Table |
sort-by :: (t :: Table, col :: String, ascending :: Boolean) -> Table |
build-column :: (T :: Table, col :: String, |
builder :: (Row -> Value)) -> Table |
the notation row["colname"] extracts the value stored in the named column in the given row.
Here is a complete list of the table operations.
What are the key takeaways from this segment?
Key idea in CS: Once data are made up of smaller pieces of data, we want to organize the data to make it easier to maintain and process. Tables are good for data about multiple entities, each of which has the same attributes.
Key idea in CS: Tables, just like images and numbers, have operations that let you manipulate. combine, and compute over them. As with computations on numbers strings, or images, you should reflect the structure of the computation in your code.
2 Back to Looking up Grades
Let’s get back to where we started the lecture, trying to lookup a specific grade for a specific student. Let’s try to rewrite that function to use the gradebook table instead of the if-statements. We’ll call it lookup-grade2.
Here’s the start of the function.
fun lookup-grade2(student :: String, asgn :: String) -> Number: |
doc: "Return grade of a given student on a given exam" |
... |
where |
lookup-grade2("Alina", "exam1") is 85 |
lookup-grade2("Nunu", "exam2") is 0 |
end |
How might this work internally? With what we’ve learned today, we could filter the table to only those rows for the named student (there should be only one!), then extract the desired exam from that single (first) row. Let’s set up the filter first:
fun has-name(r :: Row) -> Boolean: |
doc: "determine whether row has given student in the name column" |
r["name"] == student |
end |
|
fun lookup-grade2(student :: String, asgn :: String) -> Number: |
doc: "Return grade of a given student on a given exam" |
|
# not yet done -- just putting in the filter part for now |
filter-by(gradebook, has-name) |
where: |
lookup-grade2("Alina", "exam1") is 85 |
lookup-grade2("Nunu", "exam2") is 0 |
end |
If we try to run this, Pyret reports an error that student is unbound (highlighting the use of student in has-name). What’s the problem?
Remember the substitution rule we learned last week – when you call a function, Pyret substitutes the inputs for the variable names in the body of the function, then keeps evaluating. So Pyret sees the following sequence of steps:
lookup-grade2("Alina", "exam1") |
filter-by(gradebook, has-name) |
Under the hood, Pyret now runs has-name on each row of the table in turn. So Pyret tries to run
has-name(gradebook.row-n(0)) |
which turns into
gradebook.row-n(0)["name"] == student |
This is when Pyret reports an error that it doesn’t know what student is. Since student wasn’t an input to has-name, nothing got substituted for the student variable in the code.
Unfortunately, we can’t just ad student as an input to has-name, because filter-by only works with functions that take a row and produce a Boolean. So what do we do?
The solution is to put the has-name definition in a place where the substitution for student will have an impact. Since student only gets replaced in the body of lookup-grade2, we put the definition of has-name inside that of lookup-grade2.
fun lookup-grade2(student :: String, asgn :: String) -> Number: |
doc: "Return grade of a given student on a given exam" |
|
fun has-name(r :: Row) -> Boolean: |
doc: "determine whether row has given student in the name column" |
r["name"] == student |
end |
|
# not yet done -- just putting in the filter part for now |
filter-by(gradebook, has-name) |
where: |
lookup-grade2("Alina", "exam1") is 85 |
lookup-grade2("Nunu", "exam2") is 0 |
end |
This version will work. The code still errors since lookup-grade2 is supposed to return a Boolean and instead returns a table (at the moment), but the filter-by now works correctly.
To finish the function, we now need to extract the asgn grade from the first row of the table. This uses material from the start of the lecture. We build off the filter-by as follows:
filter-by(gradebook, has-name).row-n(0)[asgn] |
Note that asgn is not in quotation marks here. Doing that (i.e., writing ["asgn"]) would tell Pyret to look for a column named "asgn", whereas we want the column provided as an input to the function (i.e., "exam1" or "exam2").
This leaves the final function as:
fun lookup-grade2(student :: String, asgn :: String) -> Number: |
doc: "Return grade of a given student on a given exam" |
|
fun has-name(r :: Row) -> Boolean: |
doc: "determine whether row has given student in the name column" |
r["name"] == student |
end |
|
filter-by(gradebook, has-name).row-n(0)[asgn] |
where: |
lookup-grade2("Alina", "exam1") is 85 |
lookup-grade2("Nunu", "exam2") is 0 |
end |
Wow – this example pulled together a lot of ideas from the course so far in one function. The takeaways here are:
If you need to write a filter-by that uses an input of a function, nest the filtering condition function inside the function that takes the input.
If a column name to reference is provided as an input, don’t use quotation marks around the input variable name in the row-lookup expression.