Lecture setup: Reflecting on Tables
Copyright (c) 2017 Kathi Fisler
1 Data Design Problem – Ancestry Data
Imagine that we wanted to represent geneaology information for purposes of a medical research study. Specifically, we want to record people’s birthyear, eye colors, and biological parents. Here’s a picture showing the relationships between people and their biological parents.
To capture this in code, we might create a table such as the following:
relatives = table: name, birthyear, eyecolor, mother, father |
row: "Anna", 1997, "blue", "Susan", "Charlie" |
row: "Susan", 1971, "blue", "Ellen", "Bill" |
row: "Charlie" 1972, "green", "NoInfo", "NoInfo" |
row: "Ellen", 1945, "brown", "Laura", "John" |
... |
end |
Assume we wanted to be able to answer questions such as the following:
How frequent is each eye color?
How many generations do we have information for?
What’s the average age of mothers (or fathers) at time of birth?
Is one specific person an ancestor of another specific person?
Think about these four questions, and how you would answer each one given a relatives table like the one shown above. Which table operations would be useful? Which questions seem straightforward to answer? Which seem clumsy?
Record your answers in the Canvas pre-lecture survey for the next lecture.