Class summary: Introduction to Tables
Copyright (c) 2017 Kathi Fisler
The section in PAPL on tabular data uses a different notation for table operations than we present in this lecture, so reading that section may well be more confusing than helpful.
1 Tabular Data
When you have a collection of data that report the same attributes about a group of entities, a data table (or just table) can be a good way to organize the data.
Start with a gradebook table:
include shared-gdrive("table-functions.arr", "14jG4wvAMhjJue1-EmY-u9hX4UwmPHCO8") |
include tables |
|
gradebook = table: name, SNC, exam1, exam2 |
row: "Alina", false, 85, 90 |
row: "Carl", false, 75, 60 |
row: "Elan", true, 95, 63 |
row: "Lavon", false, 87, 88 |
row: "Nunu", true, 70, 0 |
end |
(Note: At Brown, "SNC" means "pass/fail" rather than taking a class for a letter grade)
What computations might you want to do with this table?
compute course grades
get histogram of performance on each exam
look at delta from first exam to second exam
check whether SNC or letter-grade students did better on exam2
get names of students who did poorly on the first exam
etc
To do these analyses, need to be able to do operations on tables. What sort of operations do you need?
filter out some rows (to look at only low grades)
re-order the rows (to see high or low scores first)
perform computations based on particular some columns (i.e., SNC and exam2)
add a column with the overall course grade
Our first task was to learn how to do these computations in Pyret. Here are examples of these computations:
#---------------------------------------------- |
# order the rows by descending values on exam1 |
sort-by(gradebook, "exam1", false) |
|
#---------------------------------------------- |
# keep only those rows in which the SNC column contains true |
fun taking-snc(r :: Row) -> Boolean: |
r["SNC"] |
end |
|
filter-by(gradebook, taking-snc) |
|
#---------------------------------------------- |
# keep only those rows in which the SNC column contains false |
fun not-taking-snc(r :: Row) -> Boolean: |
not(r["SNC"]) |
end |
|
filter-by(gradebook, not-taking-snc) |
|
#---------------------------------------------- |
# keep those students whose grades dropped from exam1 to exam2 |
fun exam2-lower(r :: Row) -> Boolean: |
r["exam1"] > r["exam2"] |
end |
|
filter-by(gradebook, exam2-lower) |
|
#---------------------------------------------- |
# add a column with the average of the exam grades |
fun exam-avg(r :: Row) -> Number: |
(r["exam1"] + r["exam2"]) / 2 |
end |
|
build-column(gradebook, "avg", exam-avg) |
|
#---------------------------------------------- |
# sort by exam averages |
sort-by( |
build-column(gradebook, "avg", exam-avg), |
"avg", false) |
1.1 Summary of Table Operations
As a summary, here are the functions shown in these examples (you won’t find them in the Pyret documentation, as they have only recently been added to the language):
filter-by :: (t :: Table, (test :: Row->Boolean)) -> Table |
sort-by :: (t :: Table, col :: String, ascending :: Boolean) -> Table |
build-column :: (T :: Table, col :: String, |
builder :: (Row -> Value)) -> Table |
the notation row["colname"] extracts the value stored in the named column in the given row
What are the key takeaways from this segment?
Key idea in CS: Once data are made up of smaller pieces of data, we want to organize the data to make it easier to maintain and process. Tables are good for data about multiple entities, each of which has the same attributes.
Key idea in CS: Tables, just like images and numbers, have operations that let you manipulate. combine, and compute over them. As with computations on numbers strings, or images, you should reflect the structure of the computation in your code.