Class summary:   Introduction to Trees
1 Data Structures for Family Trees
1.1 Family Trees as Tables
1.2 Creating a Data block for Trees
2 Programming Over Trees
2.1 The Trees Template
2.2 Another Example
3 Binary Trees
4 Tables Versus Trees
5 More Practice

Class summary: Introduction to Trees

Copyright (c) 2017 Kathi Fisler

This material is not in the textbook.

1 Data Structures for Family Trees

Imagine that we wanted to represent geneaology information (information about people’s biological parents and genetic traits). Here’s a picture showing the relationships between people and their parents (a "family tree").

To capture this in code, we might create a table such as the following:

  family = table: name, birthyear, eyecolor, mother, father

    row: "Anna", 1997, "blue", "Susan", "Charlie"

    row: "Susan", 1971, "blue", "Ellen", "Bill"

    row: "Charlie" 1972, "green", "NoInfo", "NoInfo"

    row: "Ellen", 1945, "brown", "Laura", "John"

    ...

  end

Assume we wanted to be able to answer questions such as the following:

1.1 Family Trees as Tables

Let’s say I wanted to write a function to compute someone’s grandparents (at least, those grandparents known in the tree)

  fun grandparents(of-name :: String) -> List<String>:

    ...

  where:

    grandparents("Anna") is [list: "Laura", "John"]

    grandparents("Laura") is [list:]

    grandparents("Kathi") is [list:]

  end

What would be involved in doing that computation? What subtasks would we identify/what functions would we write?

Let’s write one of these functions to see what it would look like:

  import lists as L

  

  fun get-mother(of-name :: String, from-family :: Table):

    person-row =

      filter-by(from-family,

        lam(r :: Row): r["name"] == of-name end).row-n(0)

    person-row["mother"]

  where:

    get-mother("Anna", family) is "Susan"

  end

What happens if the person we asked for isn’t in the table (meaning that we don’t know their family history)? Right now, we get a Pyret error. The error arises because we shouldn’t try to use L.get unless we know that we found a row for the named person. We could modify the code, but that would be premature.

As always, start with examples: what should the function produce if the named person doesn’t have a row in the table?

  fun get-mother2(of-name :: String, from-family :: Table):

    person-table =

      filter-by(from-family, lam(r :: Row): r["name"] == of-name end)

    if person-table.length() > 0:

      person-table.row-n(0)["mother"]

    else:

      false

    end

  where:

    get-mother2("Anna", family) is "Susan"

    get-mother2("Fred", family) is false

  end

If you imagine chaining together calls to get-mother in order to find ancestors (and having to also do that on the father’s side), we’d quickly see that we end up doing a lot of table filtering, which seems inefficient.

Look back at the family tree picture. We don’t do any complicated filtering there – we just follow the line in the picture immediately from a person to their mother or father. Can we get that idea in code instead? Yes, through data blocks.

1.2 Creating a Data block for Trees

For this approach, we want to create a data block for Family Trees that has a variant (constructor) for setting up a person. Look back at our picture – what information makes up a person? Their name, their mother, and their father. That suggests the following pattern, which basically turns a row into a data block:

  data FamTree:

    | person(

        name :: String,

        mother :: String,

        father :: String

        )

  end

Try to build the family tree from the picture using this data:

  anna-person = person("Anna", "Susan", "Charlie")

  susan-person = person("Susan", "Ellen", "Bill")

Wait – this seems wierd – we have one family (tree), but we’re setting up separate people? Do we maybe want a list of this information instead?

  family-lst =

    [list:

      person("Anna", "Susan", "Charlie"),

      person("Susan", "Ellen", "Bill")

    ]

This is better (one piece of data for the entire family tree, but it still seems to be missing the "tree-ness" of the picture. Note that in the picture, it is easy to get from Anna to her grandparents. Here, there’s this list and we have to look across the people to find the next generation. Could we do better?

Remember that we can make the mother and father be any type we would like. They don’t have to be Strings. In fact when we look at the picture, what we see up the mother and father sides is an entire family tree. Wouldn’t this then be better?

  data FamTree:

    | person(

        name :: String,

        mother :: FamTree2,

        father :: FamTree2

        )

  end

Try writing the family tree using this definition instead. Do the part starting just from Susan for now.

Hopefully, you got this far, but there’s a question of what to put in the ellipses (the cases in which we don’t know what person goes in there)

  susan-as-tree =

    person2("Susan",

      person2("Ellen", ..., ...),

      person2("Bill",

        person2("Laura", ..., ...),

        person2("John", ..., ...))

      )

How do we fill in the ellipses? Could we use something like false?

  susan-as-tree =

    person2("Susan",

      person2("Ellen", false, false),

      person2("Bill",

        person2("Laura", false, false),

        person2("John", false, false))

      )

Oops – that didn’t work. Why not? Our data block requires the mother and father to be FamTrees, but false isn’t a FamTree. Maybe we could relax the type of mother/father to allow Famtree or boolean, but there’s acutally a better approach. We were only using false because we needed some kind of data that we could distinguish from a real name. We can get the same affect by adding another variant of family tree, one corresponding to an "empty" tree (or a tree with no people)

data FamTree: | unknown | person( name :: String, mother :: FamTree, father :: FamTree ) end

Now, we can finish our example

  susan-tree =

    person("Susan",

      person("Ellen", unknown, unknown),

      person("Bill",

        person("Laura", unknown, unknown),

        person("John", unknown, unknown))

      )

Or we can build up the entire family:

  the-family =

    person("Anna",

      susan-tree,

      person("Charlie", unknown, unknown))

How would we find Susan’s mother?

susan-tree.mother

This gives the entire person structure. What if I want her name?

susan-tree.mother.name

We still need to come back to the discussion comparing tables and trees, but first, let’s write some programs over trees.

2 Programming Over Trees

Write count-gens, which takes a FamTree and determines the maximum number of generations up any branch of the tree. Don’t forget to write examples!

It might be easier to try to think out what computations would have to occur to build up the answer here. Start with

  count-gens("Anna")

What should the answer be? Anna contributes a generation. The number of generations in her family must be based on the number of generations in each of her mother’s and father’s families. Do we add all of those up? No, we don’t want to count her mother and father as separate generations. Perhaps we should use max to keep the largest number of generations from her parents. This is the sequence of steps informally (informal because we are using the names of the people to refer to their trees without having defined those trees separately)

  > count-gens(Anna)

  > 1 + max(count-gens(Susan), count-gens(Charlie))

  > 1 + max(1 + max(count-gens(Ellen), count-gens(Bill))),

            count-gens(Charlie))

  > ...

Having seen this, let’s turn it into code:

  fun count-gens(ft :: FamTree) -> Number:

    doc: "produce number of generations in longest branch of the tree"

    cases (FamTree) ft:

      | unknown => 0

      | person(name, mother, father) =>

        1 + num-max(count-gens(mother), count-gens(father))

    end

  where:

    count-gens(unknown) is 0

    count-gens(the-family) is 4

  end

2.1 The Trees Template

Did you dive in and try writing count-gens from scratch? Remember than when we did lists we had the notion of a template that captured how we traverse (aka, walk along) the entire data structure. The template expanded the data structure into cases, then made a recursive call on the rest of the list (which was also a list).

We can use that same approach here, developing a template for trees. In the tree case, however, there are recursive calls on each of the mother and the father. Here is the template for a family tree:

  fun ft-func(ft :: FamTree) -> ???:

    cases (FamTree) ft:

      | unknown =>

      | person(name, mother, father) =>

        ... name

        ... ft-func(mother)

        ... ft-func(father)

    end

Think about starting from this template as you try the next example.

2.2 Another Example

Write in-family, which takes a name and a FamTree and determines whether there is a person in the tree with that name. Don’t forget to write examples!

  fun in-family(a-name :: String, ft :: FamTree) -> Boolean:

    doc: "determine whether family has a person with the given name"

    cases (FamTree) ft:

      | unknown => false

      | person(name, mother, father) =>

        (name == a-name) or

        in-family(a-name, mother) or

        in-family(a-name, father)

    end

  where:

    in-family("Bill", unknown()) is false

    in-family("Zoe", unknown()) is false

    in-family("Susan", the-family) is true

    in-family("Zoe", the-family) is false

    in-family("John", the-family) is true

  end

3 Binary Trees

What we have worked out here as family trees is actually a common data structure in CS called a binary tree (binary because each position in the tree refers to two other positions). If you go on in CS, you will see binary trees in many different contents. Here, we are just pointing out the common term for what we have built.

4 Tables Versus Trees

Let’s get back to the discussion about tables vs trees – what are the benefits of each?

Trees:

Tables:
  • capture siblings easily

  • feels more like a database of data on people

There are clearly tradeoffs here. In Computer Science, trees are often used instead of table, because of the direct access to parents (and generally capturing the structure of the underlying data).

5 More Practice

If you finish those, extend the data block so that a person also has a birth year and an eye color. Think of some programs that you could write now that you have this information as well.