Class summary:   Intro to Ancestor Trees
1 Data Design Problem – Ancestry Data
1.1 Ancestry Trees as Tables
1.2 Creating a Datatype for Ancestor Trees
2 Study Questions

Class summary: Intro to Ancestor Trees

Copyright (c) 2017 Kathi Fisler

1 Data Design Problem – Ancestry Data

Imagine that we wanted to represent geneaology information for purposes of a medical research study. Specifically, we want to record people’s birthyear, eye colors, and biological parents. Here’s a picture showing the relationships between people and their biological parents.

To capture this in code, we might create a table such as the following:

  family = table: name, birthyear, eyecolor, mother, father

    row: "Anna", 1997, "blue", "Susan", "Charlie"

    row: "Susan", 1971, "blue", "Ellen", "Bill"

    row: "Charlie" 1972, "green", "NoInfo", "NoInfo"

    row: "Ellen", 1945, "brown", "Laura", "John"

    ...

  end

Assume we wanted to be able to answer questions such as the following:

How should we capture this picture in data to be able to write programs to answer our questions?

1.1 Ancestry Trees as Tables

Let’s say I wanted to write a function to compute someone’s grandparents (at least, those grandparents known in the tree). Roughly, we’d break this down into two functions: one to get the parents of a specific person, and one to get the grandparents by getting the parents of the parents. Here’s the idea:

  # For parents-of, we'll just sketch out the tasks since

  # you've implemented all of these before

  fun parents-of(anc-table: Table, person: String) -> List[String]:

    # filter the table to find the person

    # extract the name of the mother

    # extract the name of the father

    # make a list of those names

  end

  

  fun grandparents-of(anc-table: Table, person: String) -> List[String]:

    doc: "compute list of known grandparents in the table"

    # glue together lists of mother's parents and father's parents

    plist = parents-of(anc-table, person) # gives a list of two names

    parents-of(anc-table, plist.first) +

      parents-of(anc-tableplist.rest.first)

  where:

    grandparents("Anna") is [list: "Laura", "John"]

    grandparents("Laura") is [list:]

    grandparents("Kathi") is [list:]

  end

The grandparents-of isn’t quite right because we might not have two parents in the plist. We can fix this with an added if-expression:

  fun grandparents-of(anc-table :: Table, name :: String) -> List<String>:

    doc: "compute list of known grandparents in the table"

    # glue together lists of mother's parents and father's parents

    plist = parents-of(anc-table, name) # gives a list of two names

    if plist.length == 2:

      parents-of(anc-table, plist.first) + parents-of(anc-table, plist.rest.first)

    else if plist.length == 1:

      parents-of(anc-table, plist.first)

    else: empty

    end

  end

What if we now wanted to gather up all of someone’s ancestors? We discussed this in class, and realized that since we don’t know how many generations there are, we’d need to use recursion. This approach would also be expensive, since we’d end up filtering over the table over and over, which checks every row of the table.

Look back at the ancestry tree picture. We don’t do any complicated filtering there – we just follow the line in the picture immediately from a person to their mother or father. Can we get that idea in code instead? Yes, through datatypes.

1.2 Creating a Datatype for Ancestor Trees

For this approach, we want to create a datatype for Ancestor Trees that has a variant (constructor) for setting up a person. Look back at our picture – what information makes up a person? Their name, their mother, and their father (along with birthyear and eyecolor, which aren’t shown in the picture). This suggests the following datatype, which basically turns a row into a person value:

  data AncTree:

    | person(

        name :: String,

        birthyear :: Number,

        eye :: String,

        mother :: ________,

        father :: ________

        )

  end

For example, anna’s row might look like:

  anna-row = person("Anna", 1997, "blue", ???, ???)

What type do we put in the blanks? We did a quick brainstorm in class, and came up with several ideas: person. List<person>, some new datatype, AncTree, String – which should it be??

If we use a String, we’re back to the table row, and we don’t end up with a way to easily get from one person to another. We should therefore make this an AncTree.

As we worked out the rest of the Anna example, we realized we would need another option in the AncTree definition to capture people for whom we don’t know anything. Our final datatype looks like:

  data AncTree:

    | noInfo

    | person(

        name :: String,

        birthyear :: Number,

        eye :: String,

        mother :: AncTree,

        father :: AncTree

        )

  end

  

  # partly completed

  anna-row =

    person("Anna", 1997, "blue",

      person("Susan", 1971, "blue",

        person("Ellen", 1945, "brown", ...),

        ...),

      person("Charlie", 1972, "green", noInfo, noInfo))

We outlined what the new parents-of function would look like with this version:

  fun parents-of-tree(tr :: AncTree) -> List<String>:

    cases (AncTree) tr:

      | noInfo => empty

      | person(n, y, e, m, f) => [list: m.name, f.name]

        # person bit more complicated if parent is missing

    end

  end

We left off here, with a plan to look more at ancestor trees in the next class.

2 Study Questions