Class summary: Introduction to Tables
Copyright (c) 2017 Kathi Fisler
Try a new problem today: managing a gradebook. I want to write programs to help me do various computations over students’ grades on exams (average scores on an exam, average scores across exams, improvement, histograms of scores, etc). Let’s start with two students:
Allie got 85 on exam1 and 90 on exam2 Carl got 75 on exam1 and 60 on exam2
Let’s say I want to have a function that lets me lookup the grade that a specific student got on a specific exam. How might we write that code with what we have now?
fun lookup-grade(student :: String, asgn :: String) -> Number: |
doc: "Return grade of a given student on a given exam" |
if (student == "Allie"): |
if (asgn == "exam1"): |
85 |
else if (asgn == "exam2"): |
90 |
else: |
raise("No such assignment") |
end |
else if (student == "Carl"): |
if (asgn == "exam1"): |
75 |
else if (asgn == "exam2"): |
60 |
else: |
raise("No such assignment") |
end |
else: raise("No such student") |
end |
end |
(the raise operator is how we have Pyret programs generate errors in the case of unexpected or invalid inputs.)
This code should bother us for several reasons:
It won’t scale to many assignments or large numbers of students.
This function only looks up grades – if we want other gradebook operations (like averaging grades), will we have to repeat all of this data in another function?
The code treats the pieces of data about an individual student as separate pieces of information, rather than as a collection of related information.
Key idea in CS: As a general rule of thumb, we want ways to keep related data together. We also want to keep data separate from computations that process the data (so we can reuse the data).
As we discussed this setup, some students raised the idea of spreadsheets. Spreadsheets are good for grouping data into rows and columns – with a spreadsheet, we could keep all of a student’s data together in a row, while keeping all data about a particular assignment together in a column. If only we had a spreadsheet!
Actually, Pyret has a data type that gives us the structure of a spreadsheet (but without all the built-in formulas). It’s called a Table.
1 Tabular Data
Here’s our gradebook written in a Pyret table:
include tables |
include shared-gdrive("cs111-2018.arr", "1XxbD-eg5BAYuufv6mLmEllyg28IR7HeX") |
|
gradebook = table: name, SNC, exam1, exam2 |
row: "Allie", false, 85, 90 |
row: "Carl", false, 75, 60 |
row: "Elan", true, 95, 63 |
row: "Lavon", false, 87, 88 |
row: "Nunu", true, 70, 0 |
end |
What computations might you want to do with this table?
compute course grades
get histogram of performance on each exam
look at delta from first exam to second exam
check whether SNC or letter-grade students did better on exam2
get names of students who did poorly on the first exam
etc
To do these analyses, need to be able to do operations on tables. What sort of operations do you need?
filter out some rows (to look at only low grades)
re-order the rows (to see high or low scores first)
perform computations based on particular some columns (i.e., SNC and exam2)
add a column with the overall course grade
Our first task was to learn how to do these computations in Pyret. Here are examples of these computations:
#---------------------------------------------- |
# order the rows by descending values on exam1 |
# false means sort descending; true would sort ascending |
sort-by(gradebook, "exam1", false) |
|
#---------------------------------------------- |
# keep only those rows in which the SNC column contains true |
# to do this, we need a function that takes a row and produces |
# a boolean indicating whether to keep the row. |
# |
# the notation r["SNC"] extracts the SNC value from the given row. |
# you don't know what rows are yet, but all you need for now is the |
# ability to look up values in their columns. |
|
fun taking-snc(r :: Row) -> Boolean: |
doc: "get the value in the given row's SNC column" |
r["SNC"] |
end |
|
# filter-by produces a table with the rows from the given gradebook |
# for which the given function returns true. |
filter-by(gradebook, taking-snc) |
|
#---------------------------------------------- |
# keep only those rows in which the SNC column contains false |
fun not-taking-snc(r :: Row) -> Boolean: |
not(r["SNC"]) |
end |
|
filter-by(gradebook, not-taking-snc) |
|
#---------------------------------------------- |
# keep those students whose grades dropped from exam1 to exam2 |
# this shows how to do a computation over two values in a row |
|
fun exam2-lower(r :: Row) -> Boolean: |
r["exam1"] > r["exam2"] |
end |
|
filter-by(gradebook, exam2-lower) |
|
# extract the first row from a table |
gradebook.row-n(0) |
# this . notation means "dig into" the table. It's what we'll use |
# when we have to pull info out of a table. |
|
# extract the exam1 value from the first row of the table |
gradebook.row-n(0)["exam1"] |
|
Some notes:
We not yet writing where blocks for these functions because we haven’t shown you how to create rows. Today’s goal is just to get you some intuition about table operations.
Why are column names in strings when they appear to be given as names in the table notation? In other words, why can’t we write r[SNC] without the quotation marks?
If Pyret tries to run r[SNC], it will look for a definition of the form SNC = <expression> in the file. No such definition exists, so Pyret will report an error. We therefore turn column names into strings and let the table/row operations (filter-by, sort-by, etc) find the columns with the given strings as names.
1.1 Summary of Table Operations So Far
As a summary, here are the functions shown in these examples (you won’t find them in the Pyret documentation – you need to use our documentation of table operations instead:
filter-by :: (t :: Table, (test :: Row->Boolean)) -> Table |
sort-by :: (t :: Table, col :: String, ascending :: Boolean) -> Table |
the notation row["colname"] extracts the value stored in the named column in the given row
What is the key takeaway from this segment?
Key idea in CS: Once data are made up of smaller pieces of data, we want to organize the data to make it easier to maintain and process. Tables are good for data about multiple entities, each of which has the same attributes.