Class summary: Introduction to Tables
Copyright (c) 2017 Kathi Fisler
This material goes with intro to tabular data from the textbook
The text uses a check construct that we have not covered yet – ignore the check statement for a day or two.
1 Tabular Data
Start with a gradebook table:
gradebook = table: name, SNC, exam1, exam2 |
row: "Alina", false, 85, 90 |
row: "Carl", false, 75, 60 |
row: "Elan", true, 95, 63 |
row: "Lavon", false, 87, 88 |
row: "Nunu", true, 70, 92 |
end |
What computations might you want to do with this table?
compute course grades
get histogram of performance on each exam
look at delta from first exam to second exam
check whether SNC or letter-grade students did better on exam2
get names of students who did poorly on the first exam
etc
To do these analyses, need to be able to do operations on tables. What sort of operations do you need?
filter out some rows (to look at only low grades)
re-order the rows (to see high or low scores first)
filter out some columns (to focus on SNC and exam2)
add a column with the overall course grade
Going to learn how to do these in Pyret. Let’s focus on the first three, then come back to the fourth a bit later.
2 Table Operations: Keeping and Ordering
# order the rows by descending values on exam1 |
order gradebook: exam1 descending end |
# keep only those rows in which the SNC column contains true |
sieve gradebook using SNC: |
SNC |
end |
# keep only those rows in which the SNC column contains false |
sieve gradebook using SNC: |
not(SNC) |
end |
# keep only the exam2 column |
select exam2 from gradebook end |
Notice that these operations return a table but leave the original gradebook table unchanged (this may look unusual to those with prior programming experience).
Each of these operations has a specific form (syntax) to follow (the all-caps words are where we customize the operation to a specific computation):
order TABLE: column CRITERION end |
|
sieve TABLE using COLUMN: |
CRITERION |
end |
|
sieve TABLE using COLUMN, COLUMN, ...: |
CRITERION |
end |
|
select COLUMN from TABLE end |
Programming Tip: When you first learn to program, getting all the punctuation and keywords in the right place can get frustrating. Be patient. Once you’ve written these kinds of expressions multiple times, your brain will start to internalize the patterns. You don’t get to that point just by reading code though – you have to write or type expressions using the operators. If you are finding this frustrating, write extra examples using our example tables to help train your brain.
3 Practice: Keeping and Ordering
What if we wanted a table of students taking the course SNC, sorted from high to low grades on exam2? First off, what table do we expect to get?
table: name, SNC, exam1, exam2 |
row: "Nunu", true, 70, 92 |
row: "Elan", true, 95, 63 |
end |
Programming Tip: Always think out a couple of concrete examples of what you want your program to produce BEFORE you write the program. This helps make sure that the question/goal is clear up front. Otherwise, if the answer "looks wrong", you won’t be sure whether the problem is in your code or your understanding of the question.
How would we achieve this in code?
# table of students taking the course SNC |
snc-only = sieve gradebook using SNC: |
SNC |
end |
|
# order SNC table by decreasing exam2 |
order snc-only: |
exam2 descending |
end |
What if we wanted a table of those students who got at least 85 on both exams? What table do we expect to get?
table: name, SNC, exam1, exam2 |
row: "Alina", false, 85, 90 |
row: "Lavon", false, 87, 88 |
end |
How would we achieve this in code?
# create table of students with at least 85 on both exams |
both-over-85 = sieve gradebook using exam1, exam2: |
(exam1 >= 85) and (exam2 >= 85) |
end |
Note that can use expressions of any complexity as the sieve criterion, but that expression can only use columns named in the using clause.
Load the weather table and practice writing the expressions mentioned in the file.
The intensity of peppers are rated using the Scoville scale, (which essentially measures the ratio of pepper extract to sugar water needed to cancel the heat of the pepper). We want to be able to look up the name of a pepper based on its scoville rating. Bell peppers have a rating of 0. Paprika ranges from 100-1000 units, jalapeno from 3,500-10,000 units, cayenne from 30,000-50,000, and habanero from 100,000-350,000.
Write this program two ways: once using if-statements and once using tables. What are the strengths and weaknesses of each approach?
Now that you’ve practiced how the operators should get used, let’s try finding mistakes in incorrect programs. Load the broken menu programs and work on correcting the errors. Try summarizing what the error was at a high level, to help you internalize what went wrong.
4 Adding Columns to Tables
Return to the gradebook table. To help in computing course grades, we’d like to add a column that averages the two exam scores.
grades-with-average = extend gradebook using exam1, exam2: |
exam-avg: (exam1 + exam2) / 2 |
end |
You don’t have to save the resulting table under a name (like grades-with-average), but doing so makes it easier to use the table in another computation later.
Go back to the weather table. Add a column named heat-alert that contains a boolean. The column should contain true when the temperature was at least 100 and there was no rainfall and false otherwise.
5 Transforming Tables
Sometimes, we simply want to alter the contents of cells, but without adding new columns. The transform operator takes care of this. For example, we might want to round up all of the exam averages to the nearest whole number.
transform grades-with-average using exam-avg: |
exam-avg: num-round(exam-avg) |
end |
If you ask to see grades-with-average in the interactions window now, will it show rounded grades or unrounded ones? Why? How do we get to see the table with the rounded averages again?
Did we have to extend and transform in two steps? Could we have rounded the average in the first place, at the time that we extended the table with the exam-avg column?
We could have done it in one step, by writing:
grades-with-average = extend gradebook using exam1, exam2: |
exam-avg: num-round((exam1 + exam2) / 2) |
end |
The difference between the approaches is that one gives you access to the table without the rounded data – which approach you take depends on whether you need that data. Alternatively, you may have only realized later that you wanted the numbers rounded; using transform lets you respond to changes that had to get made after the column was created.
Key idea in CS: In both programming and computer science in general, we often deal with changes in the problem we were trying to solve. This is a theme we will come back to again and again in this course.
6 Practice: Extending and Transforming
Transform the gradebook so that all students are taking the course for a grade, rather than as SNC.
Extend the gradebook with a column that records the highest exam grade that each student received.
Oops! There was a grading error on the first exam. Everyone should get 5 additional points on exam1. Write an expression to do that on the table that also records the highest grade.
Look at the table you produced – is the table correct? If not, where is there an error? If it is, where might there have been an error that you should watch out for?
Extend the table with a column that assigns a letter grade to each students. Students taking the course SNC get "pass" with an exam average of at least 65, and "NC" otherwise. Students taking the course for a grade get an "A" with an exam average of at least 87, a "B" with an exam average from 73 to 86, "C" with an exam average from 65 to 72, and "NC" otherwise.