Class summary: Introduction to Trees
Copyright (c) 2017 Kathi Fisler
This material is not yet in the textbok.
1 Data Structures for Family Trees
Here is a picture of a small family tree. The picture shows the mother and father of each person in the tree, when known:
Assume that we want to represent this picture in code, so we can ask questions about family trees (such as whether one person is an ancestor of another).
Propose a data structure for family trees in which people refer to their parents (ignore people referring to their children for now).
1.1 Family Trees as Tables
Many of you will see this and think of a table. Here’s a table that captures everyone other than Robert (we’ll come back to Robert later):
family = table: name :: String, mother :: String, father :: String |
row: "Anna", "Susan", "Charlie" |
row: "Susan", "Ellen", "Bill" |
row: "Bill", "Laura", "John" |
end |
Let’s say I wanted to write a function to compute someone’s grandparents (at least, those grandparents known in the tree)
fun grandparents(of-name :: String) -> List<String>: |
false # this is the wrong answer -- leaving a hole |
where: |
grandparents("Anna") is [list: "Laura", "John"] |
grandparents("Laura") is [list:] |
grandparents("Kathi") is [list:] |
end |
What would be involved in doing that computation? What subtasks would we identify/what functions would we write?
Need to go from a name to the mother
Need to go from a name to the father
Let’s write one of these functions to see what it would look like:
import lists as L |
|
fun get-mother(of-name :: String, from-family :: Table): |
person-row = |
sieve from-family using name: |
name == of-name |
end |
L.get(extract mother from person-row end, 0) |
where: |
get-mother("Anna", family) is "Susan" |
end |
What happens if the person we asked for isn’t in the table (meaning that we don’t know their family history)? Right now, we get a Pyret error. The error arises because we shouldn’t try to use L.get unless we know that we found a row for the named person. We could modify the code, but that would be premature.
As always, start with examples: what should the function produce if the named person doesn’t have a row in the table?
if we raise an error, we can’t use this function to get whichever grandparents are known (the raise would terminate the function)
if we use something like "unknown", we can’t tell the difference between a real name and this value (both are strings)
in practice, we want to return an answer of a _different type_, to avoid both problems. Here, we could return false (the boolean) to indicate that the person wasn’t found.
fun get-mother(of-name :: String, from-family :: Table): |
person-row = |
sieve from-family using name: |
name == of-name |
end |
if L.length(person-row) > 0: |
L.get(extract mother from person-row end, 0) |
else: |
false |
end |
where: |
get-mother("Anna", family) is "Susan" |
get-mother("Fred", family) is false |
end |
What would we do if we wanted to include a person (such as John) for whom we knew the name of one parent, but not the other? What would we put into the table? We might have to put false, following our approach of using a distinct type to capture missing information (but then we have to leave off the types)
family = table: name, mother, father |
row: "Anna", "Susan", "Charlie" |
row: "Susan", "Ellen", "Bill" |
row: "Bill", "Laura", "John" |
row: "John", false, "Robert" # this is the new row |
end |
This is the tables approach. Let’s try another approach that builds on data blocks instead. Later, we’ll contrast the approaches and see their strengths and weaknesses.
1.2 Creating a Data block for Trees
For this approach, we want to create a data block for Family Trees that has a variant (constructor) for setting up a person. Look back at our picture – what information makes up a person? Their name, their mother, and their father. That suggests the following pattern, which basically turns a row into a data block:
data FamTree: |
| person( |
name :: String, |
mother :: String, |
father :: String |
) |
end |
Try to build the family tree from the picture using this data:
anna-person = person("Anna", "Susan", "Charlie") |
susan-person = person("Susan", "Ellen", "Bill") |
Wait – this seems wierd – we have one family (tree), but we’re setting up separate people? Do we maybe want a list of this information instead?
family-lst = |
[list: |
person("Anna", "Susan", "Charlie"), |
person("Susan", "Ellen", "Bill") |
] |
This is better (one piece of data for the entire family tree, but it still seems to be missing the "tree-ness" of the picture. Note that in the picture, it is easy to get from Anna to her grandparents. Here, there’s this list and we have to look across the people to find the next generation. Could we do better?
Remember that we can make the mother and father be any type we would like. They don’t have to be Strings. In fact when we look at the picture, what we see up the mother and father sides is an entire family tree. Wouldn’t this then be better?
data FamTree: |
| person( |
name :: String, |
mother :: FamTree2, |
father :: FamTree2 |
) |
end |
Try writing the family tree using this definition instead. Do the part starting just from Susan for now.
Hopefully, you got this far, but there’s a question of what to put in the ellipses (the cases in which we don’t know what person goes in there)
susan-as-tree = |
person2("Susan", |
person2("Ellen", ..., ...), |
person2("Bill", |
person2("Laura", ..., ...), |
person2("John", ..., ...)) |
) |
How do we fill in the ellipses? When we did tables, we used false for this. Let’s try that:
susan-as-tree = |
person2("Susan", |
person2("Ellen", false, false), |
person2("Bill", |
person2("Laura", false, false), |
person2("John", false, false)) |
) |
Oops – that didn’t work. Why not? Our data block requires the mother and father to be FamTrees, but false isn’t a FamTree. Maybe we could relax the type of mother/father to allow Famtree or boolean, but there’s acutally a better approach. We were only using false because we needed some kind of data that we could distinguish from a real name. We can get the same affect by adding another variant of family tree, one corresponding to an "empty" tree (or a tree with no people)
data FamTree: | unknown() | person( name :: String, mother :: FamTree, father :: FamTree ) end
Now, we can finish our example
susan-tree = |
person("Susan", |
person("Ellen", unknown(), unknown()), |
person("Bill", |
person("Laura", unknown(), unknown()), |
person("John", unknown(), unknown())) |
) |
Or we can build up the entire family:
the-family = |
person("Anna", |
susan-tree, |
person("Charlie", unknown(), unknown())) |
How would we find Susan’s mother?
susan-tree.mother
This gives the entire person structure. What if I want her name?
susan-tree.mother.name
We still need to come back to the discussion comparing tables and trees, but first, let’s write some programs over trees.
2 Programming Over Trees
Write in-family, which takes a name and a FamTree and determines whether there is a person in the tree with that name. Don’t forget to write examples!
fun in-family(a-name :: String, ft :: FamTree) -> Boolean: |
doc: "determine whether family has a person with the given name" |
cases (FamTree) ft: |
| unknown() => false |
| person(name, mother, father) => |
(name == a-name) or |
in-family(a-name, mother) or |
in-family(a-name, father) |
end |
where: |
in-family("Bill", unknown()) is false |
in-family("Zoe", unknown()) is false |
in-family("Susan", the-family) is true |
in-family("Zoe", the-family) is false |
in-family("John", the-family) is true |
end |
2.1 The Trees Template
Did you dive in and try writing in-family from scratch? Remember than when we did lists we had the notion of a template that captured how we traverse (aka, walk along) the entire data structure. The template expanded the data structure into cases, then made a recursive call on the rest of the list (which was also a list).
We can use that same approach here, developing a template for trees. In the tree case, however, there are recursive calls on each of the mother and the father. Here is the template for a family tree:
fun ft-func(ft :: FamTree) -> ???: |
cases (FamTree) ft: |
| unknown() => |
| person(name, mother, father) => |
... name |
... count-gens(mother) |
... count-gens(father) |
end |
Think about starting from this template as you try the next example.
2.2 Another Example
Write count-generations, which takes a FamTree and determines the maximum number of generations up any branch of the tree. Don’t forget to write examples!
fun count-gens(ft :: FamTree) -> Number: |
doc: "produce number of generations in longest branch of the tree" |
cases (FamTree) ft: |
| unknown() => 0 |
| person(name, mother, father) => |
1 + num-max(count-gens(mother), count-gens(father)) |
end |
where: |
count-gens(unknown()) is 0 |
count-gens(the-family) is 4 |
end |
3 Tables Versus Trees
Let’s get back to the discussion about tables vs trees – what are the benefits of each?
Trees:
allow direct access to parents, rather than needing another table lookup to find parents
better support multiple people with the same name in the family
structure captures generations naturally
capture siblings easily
feels more like a database of data on people
There are clearly tradeoffs here. In Computer Science, trees are often used instead of table, because of the direct access to parents (and generally capturing the structure of the underlying data).
4 More Practice
Open the file of exercises (posted to the schedule page) and work on whichever set of exercises fits your level and interest.
If you finish those, extend the data block so that a person also has a birth year and an eye color. Think of some programs that you could write now that you have this information as well.