Class Summary: Processing Trees
Copyright (c) 2017 Kathi Fisler
In the previous lecture, we talked about how tables are a poor choice for capturing ancestor trees. Each person must reference two people (its biological parents). In a table, we can only capture references by names, which we then must look up by searching the rows of the table. But if we make our own data, we might be able to capture those connections explicitly.
Here again is the sample ancestor tree that we were trying to capture:
And here is the datatype that we came up with:
data AncTree: |
| noInfo |
| person( |
name :: String, |
birthyear :: Number, |
eye :: String, |
mother :: AncTree, |
father :: AncTree ) |
end |
Today, we need to learn how to write programs that perform calculcations over these trees.
What might such a datatype look like? We at least need a constructor
If we wanted to capture our entire ancestor tree diagram, we could write it as following:
anna-tree = |
person("Anna", 1997, "blue", |
person("Susan", 1971, "blue", |
person("Ellen", 1945, "brown", |
person("Laura", 1920, "blue", noInfo, noInfo), |
person("John", 1920, "green", |
noInfo, |
person("Robert", 1893, "brown", noInfo, noInfo))), |
person("Bill", 1946, "blue", noInfo, noInfo)), |
person("Charlie", 1972, "green", noInfo, noInfo)) |
We could also have named each person data individually.
robert-tree = person("Robert", 1893, "brown", noInfo, noInfo) |
laura-tree = person("Laura", 1920, "blue", noInfo, noInfo) |
john-tree = person("John", 1920, "green", noInfo, robert-tree) |
ellen-tree = person("Ellen", 1945, "brown", laura-tree, john-tree) |
bill-tree = person("Bill", 1946, "blue", noInfo, noInfo) |
susan-tree = person("Susan", 1971, "blue", ellen-tree, bill-tree) |
charlie-tree = person("Charlie", 1972, "green", noInfo, noInfo) |
anna-tree2 = person("Anna", 1997, "blue", susan-tree, charlie-tree) |
The latter gives you pieces of the tree to use as other examples, but loses the structure that is visible in the indentation of the first version. You could get to pieces of the first version by digging into the data, such as writing anna-tree.mother.mother to get to the tree starting from "Ellen".
1 Programs to Process Ancestor Trees
How would we write a function to determine whether anyone in the tree had a particular name? To be clear, we are trying to fill in the following code:
fun in-tree(at :: AncTree, name :: String) -> Boolean: |
doc: "determine whether name is in the tree" |
... |
How do we get started? Add some examples, remembering to check both cases of the AncTree definition:
fun in-tree(at :: AncTree, name :: String) -> Boolean: |
doc: "determine whether name is in the tree" |
... |
where: |
in-tree(anna-tree, "Anna") is true |
in-tree(anna-tree, "Ellen") is true |
in-tree(ellen-tree, "Anna") is false |
in-tree(noInfo, "Ellen") is false |
end |
What next? When we were working on lists, we talked about the template, a skeleton of code that we knew we could write based on the structure of the data. The template names the pieces of each kind of data, and makes recursive calls on pieces that have the same type. Here’s the template over the AncTree filled in:
fun in-tree(at :: AncTree, name :: String) -> Boolean: |
doc: "determine whether name is in the tree" |
cases (AncTree) at: # comes from AncTree being data with cases |
| noInfo => ... |
| person(n, y, e, m, f) => ... in-tree(m, name) ... in-tree(f, name) |
end |
where: |
in-tree(anna-tree, "Anna") is true |
in-tree(anna-tree, "Ellen") is true |
in-tree(ellen-tree, "Anna") is false |
in-tree(noInfo, "Ellen") is false |
end |
To finish the code, we need to think about how to fill in the ellipses.
When the tree is noInfo, it has no more people, so the answer should be false (as worked out in the examples).
When the tree is a person, there are three possibilities: we could be at a person with the name we’re looking for, or the name could be in the mother’s tree, or the name could be in the father’s tree.
We know how to check whether the person’s name matches the one we are looking for. The recursive calls already ask about the name being in the mother’s tree or father’s tree. We just need to combine those pieces into one Boolean answer. Since there are three possibilities, we should combine them with or
Here’s the final code:
fun in-tree(at :: AncTree, name :: String) -> Boolean: |
doc: "determine whether name is in the tree" |
cases (AncTree) at: # comes from AncTree being data with cases |
| noInfo => false |
| person(n, y, e, m, f) => (name == n) or in-tree(m, name) or in-tree(f, name) |
# n is the same as at.name |
# m is the same as at.mother |
end |
where: |
in-tree(anna-tree, "Anna") is true |
in-tree(anna-tree, "Ellen") is true |
in-tree(ellen-tree, "Anna") is false |
in-tree(noInfo, "Ellen") is false |
end |
2 Summarizing How to Approach Tree Problems
We design tree programs using the same design recipe that we covered on lists:
Write the datatype for your tree, including a base/leaf case
Write examples of your trees for use in testing
Write the function name, parameters, and types (the fun line)
Write where checks for your code
Write the template, including the cases and recursive calls. Here’s the template again for an ancestor tree, for an arbitrary function called treeF:
fun treeF(name :: String, t :: AncTree) -> Boolean:
cases (AncTree) anct:
| unknown => ...
| person(n, y, e, m, f) =>
... treeF(name, m) ... treeF(name, f)
end
end
Fill in the template with details specific to the problem
Test your code using your examples
For those of you going on to more CS classes, knowing how to write programs to process trees is essential. For those looking to focus on data science without a lot of programming, the main takeaway is that data sometimes need organizations other than tables to make computations more efficient.
3 Study Questions
Think of writing in-tree on a table (using filter-by) vs writing it on a tree. How many times might each approach compare the name being sought against a name in the table/tree?
Why do we need to use a recursive function to process the tree?
In what order will we check the names in the tree version?
4 Practice Problems
For practice, try problems such as
How many blue-eyed people are in the tree?
How many people are in the tree?
How many generations are in the tree?
How many people have a given name in a tree?
How many people have names starting with "A"?
... and so on