Class summary:   Introduction to Trees
1 Data Structures for Family Trees
1.1 Family Trees as Tables
1.2 Creating a Data block for Trees
2 Programming Over Trees
2.1 The Trees Template
2.2 Another Example
3 Tables Versus Trees
4 More Practice

Class summary: Introduction to Trees

Copyright (c) 2017 Kathi Fisler

This material is not yet in the textbok.

1 Data Structures for Family Trees

Here is a picture of a small family tree. The picture shows the mother and father of each person in the tree, when known:

Assume that we want to represent this picture in code, so we can ask questions about family trees (such as whether one person is an ancestor of another).

Propose a data structure for family trees in which people refer to their parents (ignore people referring to their children for now).

1.1 Family Trees as Tables

Many of you will see this and think of a table. Here’s a table that captures everyone other than Robert (we’ll come back to Robert later):

  family = table: name :: String, mother :: String, father :: String

    row: "Anna", "Susan", "Charlie"

    row: "Susan", "Ellen", "Bill"

    row: "Bill", "Laura", "John"

  end

Let’s say I wanted to write a function to compute someone’s grandparents (at least, those grandparents known in the tree)

  fun grandparents(of-name :: String) -> List<String>:

    false # this is the wrong answer -- leaving a hole

  where:

    grandparents("Anna") is [list: "Laura", "John"]

    grandparents("Laura") is [list:]

    grandparents("Kathi") is [list:]

  end

What would be involved in doing that computation? What subtasks would we identify/what functions would we write?

Let’s write one of these functions to see what it would look like:

  import lists as L

  

  fun get-mother(of-name :: String, from-family :: Table):

    person-row =

      sieve from-family using name:

        name == of-name

      end

    L.get(extract mother from person-row end, 0)

  where:

    get-mother("Anna", family) is "Susan"

  end

What happens if the person we asked for isn’t in the table (meaning that we don’t know their family history)? Right now, we get a Pyret error. The error arises because we shouldn’t try to use L.get unless we know that we found a row for the named person. We could modify the code, but that would be premature.

As always, start with examples: what should the function produce if the named person doesn’t have a row in the table?

  fun get-mother(of-name :: String, from-family :: Table):

    person-row =

      sieve from-family using name:

        name == of-name

      end

    if L.length(person-row) > 0:

      L.get(extract mother from person-row end, 0)

    else:

      false

    end

  where:

    get-mother("Anna", family) is "Susan"

    get-mother("Fred", family) is false

  end

What would we do if we wanted to include a person (such as John) for whom we knew the name of one parent, but not the other? What would we put into the table? We might have to put false, following our approach of using a distinct type to capture missing information (but then we have to leave off the types)

  family = table: name, mother, father

    row: "Anna", "Susan", "Charlie"

    row: "Susan", "Ellen", "Bill"

    row: "Bill", "Laura", "John"

    row: "John", false, "Robert" # this is the new row

  end

This is the tables approach. Let’s try another approach that builds on data blocks instead. Later, we’ll contrast the approaches and see their strengths and weaknesses.

1.2 Creating a Data block for Trees

For this approach, we want to create a data block for Family Trees that has a variant (constructor) for setting up a person. Look back at our picture – what information makes up a person? Their name, their mother, and their father. That suggests the following pattern, which basically turns a row into a data block:

  data FamTree:

    | person(

        name :: String,

        mother :: String,

        father :: String

        )

  end

Try to build the family tree from the picture using this data:

  anna-person = person("Anna", "Susan", "Charlie")

  susan-person = person("Susan", "Ellen", "Bill")

Wait – this seems wierd – we have one family (tree), but we’re setting up separate people? Do we maybe want a list of this information instead?

  family-lst =

    [list:

      person("Anna", "Susan", "Charlie"),

      person("Susan", "Ellen", "Bill")

    ]

This is better (one piece of data for the entire family tree, but it still seems to be missing the "tree-ness" of the picture. Note that in the picture, it is easy to get from Anna to her grandparents. Here, there’s this list and we have to look across the people to find the next generation. Could we do better?

Remember that we can make the mother and father be any type we would like. They don’t have to be Strings. In fact when we look at the picture, what we see up the mother and father sides is an entire family tree. Wouldn’t this then be better?

  data FamTree:

    | person(

        name :: String,

        mother :: FamTree2,

        father :: FamTree2

        )

  end

Try writing the family tree using this definition instead. Do the part starting just from Susan for now.

Hopefully, you got this far, but there’s a question of what to put in the ellipses (the cases in which we don’t know what person goes in there)

  susan-as-tree =

    person2("Susan",

      person2("Ellen", ..., ...),

      person2("Bill",

        person2("Laura", ..., ...),

        person2("John", ..., ...))

      )

How do we fill in the ellipses? When we did tables, we used false for this. Let’s try that:

  susan-as-tree =

    person2("Susan",

      person2("Ellen", false, false),

      person2("Bill",

        person2("Laura", false, false),

        person2("John", false, false))

      )

Oops – that didn’t work. Why not? Our data block requires the mother and father to be FamTrees, but false isn’t a FamTree. Maybe we could relax the type of mother/father to allow Famtree or boolean, but there’s acutally a better approach. We were only using false because we needed some kind of data that we could distinguish from a real name. We can get the same affect by adding another variant of family tree, one corresponding to an "empty" tree (or a tree with no people)

data FamTree: | unknown() | person( name :: String, mother :: FamTree, father :: FamTree ) end

Now, we can finish our example

  susan-tree =

    person("Susan",

      person("Ellen", unknown(), unknown()),

      person("Bill",

        person("Laura", unknown(), unknown()),

        person("John", unknown(), unknown()))

      )

Or we can build up the entire family:

  the-family =

    person("Anna",

      susan-tree,

      person("Charlie", unknown(), unknown()))

How would we find Susan’s mother?

susan-tree.mother

This gives the entire person structure. What if I want her name?

susan-tree.mother.name

We still need to come back to the discussion comparing tables and trees, but first, let’s write some programs over trees.

2 Programming Over Trees

Write in-family, which takes a name and a FamTree and determines whether there is a person in the tree with that name. Don’t forget to write examples!

  fun in-family(a-name :: String, ft :: FamTree) -> Boolean:

    doc: "determine whether family has a person with the given name"

    cases (FamTree) ft:

      | unknown() => false

      | person(name, mother, father) =>

        (name == a-name) or

        in-family(a-name, mother) or

        in-family(a-name, father)

    end

  where:

    in-family("Bill", unknown()) is false

    in-family("Zoe", unknown()) is false

    in-family("Susan", the-family) is true

    in-family("Zoe", the-family) is false

    in-family("John", the-family) is true

  end

2.1 The Trees Template

Did you dive in and try writing in-family from scratch? Remember than when we did lists we had the notion of a template that captured how we traverse (aka, walk along) the entire data structure. The template expanded the data structure into cases, then made a recursive call on the rest of the list (which was also a list).

We can use that same approach here, developing a template for trees. In the tree case, however, there are recursive calls on each of the mother and the father. Here is the template for a family tree:

  fun ft-func(ft :: FamTree) -> ???:

    cases (FamTree) ft:

      | unknown() =>

      | person(name, mother, father) =>

        ... name

        ... count-gens(mother)

        ... count-gens(father)

    end

Think about starting from this template as you try the next example.

2.2 Another Example

Write count-generations, which takes a FamTree and determines the maximum number of generations up any branch of the tree. Don’t forget to write examples!

  fun count-gens(ft :: FamTree) -> Number:

    doc: "produce number of generations in longest branch of the tree"

    cases (FamTree) ft:

      | unknown() => 0

      | person(name, mother, father) =>

        1 + num-max(count-gens(mother), count-gens(father))

    end

  where:

    count-gens(unknown()) is 0

    count-gens(the-family) is 4

  end

3 Tables Versus Trees

Let’s get back to the discussion about tables vs trees – what are the benefits of each?

Trees:

Tables:
  • capture siblings easily

  • feels more like a database of data on people

There are clearly tradeoffs here. In Computer Science, trees are often used instead of table, because of the direct access to parents (and generally capturing the structure of the underlying data).

4 More Practice

Open the file of exercises (posted to the schedule page) and work on whichever set of exercises fits your level and interest.

If you finish those, extend the data block so that a person also has a birth year and an eye color. Think of some programs that you could write now that you have this information as well.