Performance and Tree intro

Program performance

We’ve talked about making programs more readable and concise. We have not yet talked very much about how to make programs fast. This won’t be a focus of the course (and you generally won’t be graded on program performance) but it’s worth seeing some examples of how to reason informally about program performance.

Consider these two versions of a program to find the characters in a text (as on HW 6).

fun is-character(str :: String) -> Boolean:
  string-to-upper(str) == str
end

fun find-charactersO(the-text :: List<String>) -> List<String>:
  L.distinct(L.filter(is-character, the-text))
end

fun find-charactersR(the-text :: List<String>) -> List<String>:
  cases (List) the-text:
    | empty => empty
    | link(fst, rst) => 
      if is-character(fst):
        L.distinct(link(fst, find-charactersR(rst)))
      else:
        L.distinct(find-charactersR(rst))
      end
  end
end

Do you think one of them is faster? Why?

find-characters0 seems like it should be faster, since it only calls distinct once instead of calling it once per member of the list.

How much faster is it? How long does each one take? In order to answer these questions, we’ll need to know the definitions of filter and distinct:

fun filter(f :: Function, lst :: List) -> List:
  cases(List) lst:
    | empty => empty
    | link(fst, rst) =>
      if f(fst):
        link(fst, filter(f, rst))
      else:
        filter(f, rst)
      end
  end
end

fun distinct(lst: List) -> List:
  cases (List) lst:
    | empty => empty
    | link(fst, rst) =>
      if L.member(rst, fst):
        distinct(rst)
      else:
        link(fst, distinct(rst))
      end
  end
end

Let’s take a look at filter. It’s looking once at every element of the list, so it runs in an amount of time proportional to the size of the list. We call filter’s running time linear–for each member in the list, filter does a constant operations, so its runtime grows linearly with its input.

How about distinct? At first glance, distinct looks like filter–it’s doing some work for every element of the list. But distinct is calling member. How long does each call to member take?

member might have to look at every remaining element of the list–for instance, it will do this on characters that appear once. So let’s say member’s running time is linear. Since distinct calls member for every member of the list, its running time is quadratic–it’s proportional to the square of the list’s length.

So: what’s the running time of find-characters0? We’re running a linear operation, then a quadratic operation. So our running time is quadratic (as the list gets large, the quadratic term dominates the linear term).

How about find-charactersR1? Here, we’re running distinct–a quadratic operation–on every element of the list. So our running time is cubic!

FYI, here’s a more efficient recursive version:

fun find-charactersR2(the-text :: List<String>) -> List<String>:
  cases (List) the-text:
    | empty => empty
    | link(fst, rst) => 
      if is-character(fst) and not(L.member(rst, fst)):
        link(fst, find-charactersR2(rst))
      else:
        find-charactersR2(rst)
      end
  end
end

Ancestry data

Imagine we’re trying to do a genealogy project–we’re looking at eye color heritability. Our data are from the 18th-century House of Hanover; specifically, the family tree of King George III of the United Kingdom. Here’s a chunk of the tree:

georgetree.png

How would we represent these data? We could use a table:

name eye-color mother father
“George” “green” “Auguste” “Frederick”
“Auguste” “green” “Magdalena” “Friedrich II”
“Frederick” “brown” “” “”

Let’s say we want to write a function to get someone’s grandparents. How would we do it?

We’d have to first get parents, then get grandparents. Each would involve filtering the table based on name, and dealing with empty data (e.g., if a parent field is empty then we can’t get the grandparent). We could do it, but it would be unpleasant.

Could we use a datatype for this?

data AncTree:
  | person(name :: String, eye-color :: String, mother :: ???, father :: ???)
end

What should the types of mother and father be? We could use String, but that leaves us with the same problem we had before–we’d have to search for a person with the right name. How about this?

data AncTree:
  | person(name :: String, eye-color :: String, mother :: AncTree, father :: AncTree)
end

We’ll talk more about trees on Wednesday and Friday.