Class summary: Datatypes, Efficiency, and Ancestor Trees

1 Organizing Game States

We reviewed the use of multi-case datatypes and how they might relate to defining the state of a videogame. Imagine that you were building a game that featured a robot and an alien. Which of the following two datatypes makes more sense?

data GameState1:

| robot(pos :: Posn)

| alien(pos :: Posn)

end

data GameState2:

| state(robot-pos :: Posn, alien-pos :: Posn)

end

Briefly, the first has you choose between the robot or the alien, whereas the second has both a robot and a alien in the game. See the lecture capture for more discussion of this.

2 Efficiency of Merging Tables

We reviewed the terminology of constant, linear, and quadratic when talking about efficiency, including presenting a diagram showing what each of these look like as graphs of the size of an input versus the number of underlying computations that get performed.

We then returned to the table merging example from the first project, to contrast the efficiency of the approach you took in the project from an approach based on treating the columns as lists. The punchline here is that how to represent your data has a big impact on how efficiently a computation can be performed.

See the lecture capture for more detail.

3 Data Design Problem – Ancestry Data

Imagine that we wanted to represent geneaology information for purposes of a medical research study. Specifically, we want to record people’s birthyear, eye colors, and biological parents. Here’s a picture showing the relationships between people and their biological parents.

To capture this in code, we might create a table such as the following:

family = table: name, birthyear, eyecolor, mother, father

row: "Anna", 1997, "blue", "Susan", "Charlie"

row: "Susan", 1971, "blue", "Ellen", "Bill"

row: "Charlie" 1972, "green", "NoInfo", "NoInfo"

row: "Ellen", 1945, "brown", "Laura", "John"

...

end

Assume we wanted to be able to answer questions such as the following:

How frequent is each eye color?
How many generations do we have information for?
What’s the average age of mothers (or fathers) at time of birth?
Is one specific person an ancestor of another specific person?

How should we capture this picture in data to be able to write programs to answer our questions?

3.1 Ancestry Trees as Tables

These notes have more detail than we did in class. In class we talked through the problem of finding ancestors loosely. Here, the underlying code is presented in more detail.

Let’s say I wanted to write a function to compute someone’s grandparents (at least, those grandparents known in the tree)

fun grandparents(of-name :: String) -> List<String>:

...

where:

grandparents("Anna") is [list: "Laura", "John"]

grandparents("Laura") is [list:]

grandparents("Kathi") is [list:]

end

What would be involved in doing that computation? What subtasks would we identify/what functions would we write?

Need to go from a name to the mother
Need to go from a name to the father
Repeat previous two steps on mother and father

Let’s write one of these functions to see what it would look like:

import lists as L

fun get-mother(of-name :: String, from-table :: Table):

person-row =

filter-by(from-table,

lam(r :: Row): r["name"] == of-name end).row-n(0)

person-row["mother"]

where:

get-mother("Anna", family) is "Susan"

end

What happens if the person we asked for isn’t in the table (meaning that we don’t know their family history)? Right now, we get a Pyret error. The error arises because we shouldn’t try to use L.get unless we know that we found a row for the named person. We could modify the code, but that would be premature.

As always, start with examples: what should the function produce if the named person doesn’t have a row in the table?

if we raise an error, we can’t use this function to get whichever grandparents are known (the raise would terminate the function)
if we use something like "unknown", we can’t tell the difference between a real name and this value (both are strings)
in practice, we want to return an answer of a _different type_, to avoid both problems. Here, we could return false (the boolean) to indicate that the person wasn’t found.

fun get-mother2(of-name :: String, from-table :: Table):

person-table =

filter-by(from-table, lam(r :: Row): r["name"] == of-name end)

if person-table.length() > 0:

person-table.row-n(0)["mother"]

else:

false

end

where:

get-mother2("Anna", family) is "Susan"

get-mother2("Fred", family) is false

end

If you imagine chaining together calls to get-mother in order to find ancestors (and having to also do that on the father’s side), we’d quickly see that we end up doing a lot of table filtering, which seems inefficient.

Look back at the ancestry tree picture. We don’t do any complicated filtering there – we just follow the line in the picture immediately from a person to their mother or father. Can we get that idea in code instead? Yes, through datatypes.

3.2 Creating a Datatype for Ancestor Trees

For this approach, we want to create a datatype for Ancestor Trees that has a variant (constructor) for setting up a person. Look back at our picture – what information makes up a person? Their name, their mother, and their father (along with birthyear and eyecolor, which aren’t shown in the picture). This suggests the following datatype, which basically turns a row into a person value:

data AncTree:

| person(

name :: String,

birthyear :: Number,

eye :: String,

mother :: ________,

father :: ________

)

end

What type do we put in the blanks? We did a quick brainstorm after class, and came up with several ideas: person. List<person>, some new datatype, AncTree, String – which should it be??

Tune in next class ...

1	Organizing Game States
2	Efficiency of Merging Tables
3	Data Design Problem – Ancestry Data