Class Summary:   Processing Trees
1 Programs to Process Ancestor Trees
2 Summarizing How to Approach Tree Problems
3 Study Questions
4 Practice Problems

Class Summary: Processing Trees

Copyright (c) 2017 Kathi Fisler

In the previous lecture, we talked about how tables are a poor choice for capturing ancestor trees. Each person must reference two people (its biological parents). In a table, we can only capture references by names, which we then must look up by searching the rows of the table. But if we make our own data, we might be able to capture those connections explicitly.

Here again is the sample ancestor tree that we were trying to capture:

And here is the datatype that we came up with:

  data AncTree:

    | noInfo

    | person(

        name :: String,

        birthyear :: Number,

        eye :: String,

        mother :: AncTree,

        father :: AncTree )

  end

Today, we need to learn how to write programs that perform calculcations over these trees.

What might such a datatype look like? We at least need a constructor

If we wanted to capture our entire ancestor tree diagram, we could write it as following:

  anna-tree =

    person("Anna", 1997, "blue",

      person("Susan", 1971, "blue",

        person("Ellen", 1945, "brown",

          person("Laura", 1920, "blue", noInfo, noInfo),

          person("John", 1920, "green",

            noInfo,

            person("Robert", 1893, "brown", noInfo, noInfo))),

        person("Bill", 1946, "blue", noInfo, noInfo)),

      person("Charlie", 1972, "green", noInfo, noInfo))

We could also have named each person data individually.

  robert-tree = person("Robert", 1893, "brown", noInfo, noInfo)

  laura-tree = person("Laura", 1920, "blue", noInfo, noInfo)

  john-tree = person("John", 1920, "green", noInfo, robert-tree)

  ellen-tree = person("Ellen", 1945, "brown", laura-tree, john-tree)

  bill-tree = person("Bill", 1946, "blue", noInfo, noInfo)

  susan-tree = person("Susan", 1971, "blue", ellen-tree, bill-tree)

  charlie-tree = person("Charlie", 1972, "green", noInfo, noInfo)

  anna-tree2 = person("Anna", 1997, "blue", susan-tree, charlie-tree)

The latter gives you pieces of the tree to use as other examples, but loses the structure that is visible in the indentation of the first version. You could get to pieces of the first version by digging into the data, such as writing anna-tree.mother.mother to get to the tree starting from "Ellen".

1 Programs to Process Ancestor Trees

How would we write a function to determine whether anyone in the tree had a particular name? To be clear, we are trying to fill in the following code:

  fun in-tree(at :: AncTree, name :: String) -> Boolean:

    doc: "determine whether name is in the tree"

    ...

How do we get started? Add some examples, remembering to check both cases of the AncTree definition:

  fun in-tree(at :: AncTree, name :: String) -> Boolean:

    doc: "determine whether name is in the tree"

    ...

  where:

    in-tree(anna-tree, "Anna") is true

    in-tree(anna-tree, "Ellen") is true

    in-tree(ellen-tree, "Anna") is false

    in-tree(noInfo, "Ellen") is false

  end

What next? When we were working on lists, we talked about the template, a skeleton of code that we knew we could write based on the structure of the data. The template names the pieces of each kind of data, and makes recursive calls on pieces that have the same type. Here’s the template over the AncTree filled in:

  fun in-tree(at :: AncTree, name :: String) -> Boolean:

    doc: "determine whether name is in the tree"

    cases (AncTree) at:     # comes from AncTree being data with cases

      | noInfo => ...

      | person(n, y, e, m, f) => ... in-tree(m, name) ... in-tree(f, name)

    end

  where:

    in-tree(anna-tree, "Anna") is true

    in-tree(anna-tree, "Ellen") is true

    in-tree(ellen-tree, "Anna") is false

    in-tree(noInfo, "Ellen") is false

  end

To finish the code, we need to think about how to fill in the ellipses.

Here’s the final code:

  fun in-tree(at :: AncTree, name :: String) -> Boolean:

    doc: "determine whether name is in the tree"

    cases (AncTree) at:     # comes from AncTree being data with cases

      | noInfo => false

      | person(n, y, e, m, f) => (name == n) or in-tree(m, name) or in-tree(f, name)

        # n is the same as at.name

        # m is the same as at.mother

    end

  where:

    in-tree(anna-tree, "Anna") is true

    in-tree(anna-tree, "Ellen") is true

    in-tree(ellen-tree, "Anna") is false

    in-tree(noInfo, "Ellen") is false

  end

2 Summarizing How to Approach Tree Problems

We design tree programs using the same design recipe that we covered on lists:

For those of you going on to more CS classes, knowing how to write programs to process trees is essential. For those looking to focus on data science without a lot of programming, the main takeaway is that data sometimes need organizations other than tables to make computations more efficient.

3 Study Questions

4 Practice Problems

For practice, try problems such as