Class summary: Intro to Ancestor Trees
Copyright (c) 2017 Kathi Fisler
1 Data Design Problem – Ancestry Data
Imagine that we wanted to represent geneaology information for purposes of a medical research study. Specifically, we want to record people’s birthyear, eye colors, and biological parents. Here’s a picture showing the relationships between people and their biological parents.
To capture this in code, we might create a table such as the following:
family = table: name, birthyear, eyecolor, mother, father |
row: "Anna", 1997, "blue", "Susan", "Charlie" |
row: "Susan", 1971, "blue", "Ellen", "Bill" |
row: "Charlie" 1972, "green", "NoInfo", "NoInfo" |
row: "Ellen", 1945, "brown", "Laura", "John" |
... |
end |
Assume we wanted to be able to answer questions such as the following:
How frequent is each eye color?
How many generations do we have information for?
What’s the average age of mothers (or fathers) at time of birth?
Is one specific person an ancestor of another specific person?
How should we capture this picture in data to be able to write programs to answer our questions?
1.1 Ancestry Trees as Tables
Let’s say I wanted to write a function to compute someone’s grandparents (at least, those grandparents known in the tree). Roughly, we’d break this down into two functions: one to get the parents of a specific person, and one to get the grandparents by getting the parents of the parents. Here’s the idea:
# For parents-of, we'll just sketch out the tasks since |
# you've implemented all of these before |
fun parents-of(anc-table: Table, person: String) -> List[String]: |
# filter the table to find the person |
# extract the name of the mother |
# extract the name of the father |
# make a list of those names |
end |
|
fun grandparents-of(anc-table: Table, person: String) -> List[String]: |
doc: "compute list of known grandparents in the table" |
# glue together lists of mother's parents and father's parents |
plist = parents-of(anc-table, person) # gives a list of two names |
parents-of(anc-table, plist.first) + |
parents-of(anc-tableplist.rest.first) |
where: |
grandparents("Anna") is [list: "Laura", "John"] |
grandparents("Laura") is [list:] |
grandparents("Kathi") is [list:] |
end |
The grandparents-of isn’t quite right because we might not have two parents in the plist. We can fix this with an added if-expression:
fun grandparents-of(anc-table :: Table, name :: String) -> List<String>: |
doc: "compute list of known grandparents in the table" |
# glue together lists of mother's parents and father's parents |
plist = parents-of(anc-table, name) # gives a list of two names |
if plist.length == 2: |
parents-of(anc-table, plist.first) + parents-of(anc-table, plist.rest.first) |
else if plist.length == 1: |
parents-of(anc-table, plist.first) |
else: empty |
end |
end |
What if we now wanted to gather up all of someone’s ancestors? We discussed this in class, and realized that since we don’t know how many generations there are, we’d need to use recursion. This approach would also be expensive, since we’d end up filtering over the table over and over, which checks every row of the table.
Look back at the ancestry tree picture. We don’t do any complicated filtering there – we just follow the line in the picture immediately from a person to their mother or father. Can we get that idea in code instead? Yes, through datatypes.
1.2 Creating a Datatype for Ancestor Trees
For this approach, we want to create a datatype for Ancestor Trees that has a variant (constructor) for setting up a person. Look back at our picture – what information makes up a person? Their name, their mother, and their father (along with birthyear and eyecolor, which aren’t shown in the picture). This suggests the following datatype, which basically turns a row into a person value:
data AncTree: |
| person( |
name :: String, |
birthyear :: Number, |
eye :: String, |
mother :: ________, |
father :: ________ |
) |
end |
For example, anna’s row might look like:
anna-row = person("Anna", 1997, "blue", ???, ???) |
What type do we put in the blanks? We did a quick brainstorm in class, and came up with several ideas: person. List<person>, some new datatype, AncTree, String – which should it be??
If we use a String, we’re back to the table row, and we don’t end up with a way to easily get from one person to another. We should therefore make this an AncTree.
As we worked out the rest of the Anna example, we realized we would need another option in the AncTree definition to capture people for whom we don’t know anything. Our final datatype looks like:
data AncTree: |
| noInfo |
| person( |
name :: String, |
birthyear :: Number, |
eye :: String, |
mother :: AncTree, |
father :: AncTree |
) |
end |
|
# partly completed |
anna-row = |
person("Anna", 1997, "blue", |
person("Susan", 1971, "blue", |
person("Ellen", 1945, "brown", ...), |
...), |
person("Charlie", 1972, "green", noInfo, noInfo)) |
We outlined what the new parents-of function would look like with this version:
fun parents-of-tree(tr :: AncTree) -> List<String>: |
cases (AncTree) tr: |
| noInfo => empty |
| person(n, y, e, m, f) => [list: m.name, f.name] |
# person bit more complicated if parent is missing |
end |
end |
We left off here, with a plan to look more at ancestor trees in the next class.
2 Study Questions
Write the example for the entire tree in the diagram using the AncTree datatype (in other words, finish the definition of anna-row).
Why do we need the noInfo case in the AncTree definition?
If we had left the mother and father fields as strings in the AncTree definition, how would we create all of the information in the table using AncTree?