Programming and Programming Languages
This is the unstable, newest version, under constant revision. The current, stable edition is from 2015.
6.5 The Tabular Method for Singly-Structurally-Recursive Functions |
1 Introduction
1.1 Our Philosophy
Many people would regard this as being two books in one. One book is an introduction to programming, teaching you basic concepts of organizing data and the programs that operate over them, ending in the investigation of universally useful algorithms. The other book is an introduction to programming languages: a study, from one level up, of the media by which we structure these data and programs.
Obviously, these are not unrelated topics. We learn programming through one or more languages, and the programs we write then become natural subjects of study to understand languages at large. Nevertheless, these are considered sufficiently different topics that they are approached separately. This is how we approached them, too.The one noble exception to this separation is the best computer science book ever written, The Structure and Interpretation of Computer Programs.
We have come to realize that this separation is neither meaningful nor helpful. The topics are deeply intertwined and, by accepting that interleaving, the result is likely to be a much better book. This is my experiment with that format.
1.2 Predictability as a Theme
There are many ways to organize the study of programming and programming languages. My central theme is the concept of predictability.
Programs are typically static: they live on the moral equivalent of a
paper, unmoving and unchanging. But when we run a program, it produces
a complex, dynamic behavior that yields utility, pleasure, and
(sometimes) frustration. Everyone who writes programs ultimately
cares—
Predictability has a bad rap. Under the guise of “program reasoning”, it came to be viewed simultaneously as both noble and mind-numbingly boring. It is certainly noble, but we will try to present it a way that will hopefully seem utterly natural, indeed entirely obvious (because we believe it is). Hopefully you’ll come away from this study reasonably convinced about the central place of predictability in your own work, and as a metric for programming language design.
1.3 The Structure of This Book
Unlike some other textbooks, this one does not follow a top-down narrative. Rather it has the flow of a conversation, with backtracking. We will often build up programs incrementally, just as a pair of programmers would. We will include mistakes, not because we don’t know better, but because this is the best way for you to learn. Including mistakes makes it impossible for you to read passively: you must instead engage with the material, because you can never be sure of the veracity of what you’re reading.
At the end, you’ll always get to the right answer. However, this non-linear path is more frustrating in the short term (you will often be tempted to say, “Just tell me the answer, already!”), and it makes the book a poor reference guide (you can’t open up to a random page and be sure what it says is correct). However, that feeling of frustration is the sensation of learning. We don’t know of a way around it.
At various points you will encounter this:
This is an exercise. Do try it.
This is a traditional textbook exercise. It’s something you need to do on your own. If you’re using this book as part of a course, this may very well have been assigned as homework. In contrast, you will also find exercise-like questions that look like this:
There’s an activity here! Do you see it?
When you get to one of these, stop. Read, think, and formulate
an answer before you proceed. You must do this because this is
actually an exercise, but the answer is already in the
book—
1.4 The Language of This Book
This book uses a new programming language called Pyret. Pyret is the outgrowth of our deep experience programming in and designing functional, object-oriented, and scripting languages, as well as their type systems, program analyses, and development environments.
The language’s syntax is inspired by Python.Unlike Python, Pyret will enforce indentation rather than interpret it: that is, indentation will simply become another syntax well-formedness criterion. But that hasn’t been implemented yet. It fits the niche missing in computer science education of a simple language that sheds both the strange corner-cases (of which there are many) of Python while adding important features that Python lacks for learning programming (such as algebraic datatypes, optional annotations on variables, design decisions that better enable the construction of development environments, and strong support for testing). Beginning programmers can rest in the knowledge they are being cared for, while programmers with past acquaintance of the language menagerie, from serpents to dromedaries, should find Pyret familiar and comfortable.
2 Programming in Pyret
Programs exist to compute answers, which we will call values. These values will represent all sorts of interesting phenomena: they may be a frame from the next hit movie, a list of Web pages matching your query, a confirmation code for an upcoming trip, or a prediction of tomorrow’s temperature. Ultimately, they’re all values and our goal here will be to learn how to write programs that compute them.
You can find the book, which uses a variant of Scheme rather than Pyret, for free at http://htdp.org/.
Here, we’ll introduce values and ways of constructing them, and then move on to the problem of starting from a blank file and building up to a program that creates the answers we want. While there’s no silver bullet answer to the problem, the Design Recipe is a methodology for writing programs, described in the book How to Design Programs (HtDP), that tackles this problem for a number of programming patterns. After addressing some basics of values and expressions, we’ll explore using the Design Recipe to program in Pyret.
2.1 Values
3
"Hello, Tara!"
Okay, so we’ve seen numbers and strings. There are many more kinds of values in Pyret, including images. We’ll get to them soon.
2.2 Expressions
Obviously, we’re not very interested in writing programs whose answers we already know. Therefore, let’s start the first real step of our journey: writing expressions that take one or more steps to produce an answer.
What is this program’s value?Yes, you must put spaces around +.1 + 2
"will." + "i." + "am"
2.3 Recipe for Primitive Values
The first step in the design recipe is to analyze the values the program is working with, both for its input and for its answers. The recipe works through the problem starting from the definition of the input and finishes with the implementation. The easiest recipe to start with is the one for primitive values, like the numbers and strings we just saw.
For our first example, we’ll take something simple enough that we could understand it in just about any context: calculating hourly wage with overtime.
To keep things simple, we’ll assume a constant hourly rate of $10/hour, so the only parameter is the number of hours worked. Then we proceed through several steps.
Understand the Data Definition
The incoming hours worked are Numbers, and there are two interesting kinds of numbers: those greater than 40 and those less than or equal to 40.
Function Header and Purpose
Next, to get started with the function, we write down the name of the function, the types of the arguments, and the type of value that will be returned. We also write a short English sentence, using the doc: keyword, describing the function:
fun hours-to-wages(hours :: Number) -> Number: doc: "Compute total wage from hours, accounting for overtime" end
This says that hours-to-wages will consume a Number and return another Number. The description informs the reader that the interpretations of the numbers for argument and return are hours and wages, respectively.
Write Examples of Calling the Function (Test Cases)
Next, we write a few test cases. This helps us think through what the implementation will look like, lets us know of any problematic cases that might come up, and gives us something to run to confirm that our implementation does what we expect when we’re done.
These examples of calls go in a where: block at the end of the function, and the simplest ones use is to check that the output is an expected value:
fun hours-to-wages(hours :: Number) -> Number: doc: "Compute total wage from hours, accounting for overtime, at $10/hr base" where: hours-to-wages(40) is 400 hours-to-wages(40.5) is 407.5 hours-to-wages(41) is 415 hours-to-wages(0) is 0 hours-to-wages(45) is 475 hours-to-wages(20) is 200 end
Examples should cover at least all the different cases mentioned in the data definition, and also to test common extremes and base cases. In this case, for example, it’s useful to note that the 40th hour doesn’t count towards overtime, but the 41st does.
Implement the Body
Now all that’s left is to fill a Pyret expression into the body of the function that computes the correct answer based on the inputs.
In the case of this running example, the data definition has a condition in it: whether or not the hours have exceeded 40. Conditional data usually implies a conditional expression in code. That means an if expression to test which case of the data definition the function is handling. Then, the bodies of the branches can be filled in with the appropriate answers.
fun hours-to-wages(hours :: Number) -> Number: doc: "Compute total wage from hours, accounting for overtime, at $10/hr base" if hours < 41: hours * 10 else if hours >= 41: (40 * 10) + ((hours - 40) * (10 * 1.5)) end where: hours-to-wages(40) is 400 hours-to-wages(40.5) is 407.5 hours-to-wages(41) is 415 hours-to-wages(0) is 0 hours-to-wages(45) is 475 hours-to-wages(20) is 200 end
Run the Tests, and Fix Mistakes
Now we can click Run, and get feedback on if there are any mistakes in the definition we wrote. If any tests fail, we have to figure out if we misunderstood the test, or mis-implemented the program. It can require many rounds of back-and-forth before settling on a good set of tests and an agreeing implementation that does precisely what we intend.
Do Now!Run this example and read the output. Did all the tests pass?
In this case, there’s a mismatch between the implementation and the tests: the second test fails. When this happens, we have to ask: did we get the test wrong or the implementation? In this case, it happens to be the implementation; the first condition shouldn’t include the case where hours is between 40 and 41, which should be handled by the second case.
Since we got something wrong around a boundary condition, it’s probably a good idea to add one more test to make sure we didn’t screw up in the other direction. This is the correct implementation, with a new test to double-check things:
fun hours-to-wages(hours :: Number) -> Number: doc: "Compute total wage from hours, accounting for overtime, at $10/hr base" if hours <= 40: # this line changed hours * 10 else if hours > 40: # this line changed (40 * 10) + ((hours - 40) * (10 * 1.5)) end where: hours-to-wages(40) is 400 hours-to-wages(39.5) is 395 # this test was added hours-to-wages(40.5) is 407.5 # this test is correct now hours-to-wages(41) is 415 hours-to-wages(0) is 0 hours-to-wages(45) is 475 hours-to-wages(20) is 200 end
2.3.1 Abstracting Common Parts
The hours-to-wages function always assumes an hourly rate of $10/hour. We can change it to accommodate a different hourly rate, say $20/hour, by changing the constant 10 where it appears representing the hourly rate:
fun hours-to-wages-20(hours :: Number) -> Number: doc: "Compute total wage from hours, accounting for overtime, at $20/hr base" if hours <= 40: hours * 20 else if hours > 40: (40 * 20) + ((hours - 40) * (20 * 1.5)) end end
We could make another copy of the function for $30/hour workers, and so on. However, it’s also possible, and quite straightforward, to change the function to work for any hourly wage. We note the shared parts across the implementation and lift them out, adding a new parameter to the function.
fun hours-to-wages-at-rate(rate :: Number, hours :: Number) -> Number: doc: "Compute total wage from hours, accounting for overtime, at the given rate" if hours <= 40: hours * rate else if hours > 40: (40 * rate) + ((hours - 40) * (rate * 1.5)) end end
Note that we’ll take the convention of adding new parameters at the beginning of the argument list. We simply add the new parameter (with an appropriate annotation), and replace all instances of the constant with it.
Write a function called has-overtime that takes a number of hours and returns true if the number of hours is greater than 40 and false otherwise.
Working negative hours is nonsense. Write a version of hours-to-wages that uses the raise function to throw an error if fewer than 0 hours are reported. Use the raises form to test for it (read about raises in the Pyret documentation).
Write a function called hours-to-wages-ot that takes a number of hours, an hourly rate, and an overtime threshold, and produces the total pay. Any hours worked beyond the overtime threshold should be credited at 1.5 times the normal rate of pay, as before.
2.4 Evaluating by Reducing Expressions
There’s an imporant note about Pyret going on in this example: There are no return statements in Pyret. The function body (the if expression here) evaluates when called, and the result of the body’s evaluation is the result of the function call.
We can show this evaluation visually, with an example of a call:
hours-to-wages(45) => if 45 <= 40: hours * 10 else if 45 > 40: (40 * 10) + ((45 - 40) * 15) end
=> if false: hours * 10 else if 45 > 40: (40 * 10) + ((45 - 40) * 15) end
=> if false: hours * 10 else if true: (40 * 10) + ((45 - 40) * 15) end
=> (40 * 10) + ((45 - 40) * 15)
=> 400 + (5 * 15) => 475
This style of reduction is the best way to think about the evaluation of Pyret function calls and expressions. The whole body of the function takes steps that simplify it, much like simplifying expressions in algebra. We’ll use this reduction-style example later in the notes, and you can use it yourself if you want to try and work through the evaluation of a Pyret function by hand (or in your head).
Write out the evaluation steps forhours-to-wages-at-rate(15, 45)
for the definition of hours-to-wages-at-rate from Abstracting Common Parts.
When you look at at a test case in Pyret, like
hours-to-wages(45) is 475
the same idea of algebraic simplification applies. This will reduce to
475 is 475
which is clearly true, and is causes it to be registered as a passing test. When a check or where block runs, it evaluates each is test and compares it to the answer in this same way.
2.5 Recipe for Simple Data
This definition of primitive versus compound data has a little bit of a fuzzy border: we could think of strings as lists of characters, for instance. Usually the kind of program we’re writing makes it clear what data we are treating as “primitive” values and what data we are breaking down further.
Lots of interesting programs work over far more than primitive data like Numbers and Strings. They combine pieces of primitive data into richer datatypes, and the recipe has a natural extension to dealing with those cases.
Our example will be calculating the distance from the origin of a 2-dimensional point, and we’ll adapt the initial recipe as needed for it.
Understand the Data Definition and Write Examples
A 2D point has an x and y coordinate, which are both numbers. We’ll use a Pyret data definition to represent points that has both of these fields:
data Point: | point(x :: Number, y :: Number) end
When working with more complicated data, it’s useful to write down examples of the data itself along with the definition. These examples can be useful for thinking through what kinds of values can be represented by the data definition, and also serve as useful inputs to tests later. Some examples of points are:
point(0, 0) point(3, 4) point(-3, -4)
Write Down the Function Header, Contract, and Purpose
This step is basically unchanged from primitive data, just adapted for the new definition. We’re calculating the distance from the origin to a given point, so the function will consume a point as input and produce a Number as output. There aren’t any extra arguments:
fun distance-to-origin(p :: Point) -> Number: doc: "Produces the distance from the origin to the given point." end
Write Example Calls to the Function
Next, we write a few test cases. This helps us think through what the implementation will look like, lets us know of any problematic cases that might come up, and gives us something to run to confirm that our implementation does what we expect when we’re done. Our examples of the data definition from earlier can come in handy here as test inputs.
fun distance-to-origin(p :: Point) -> Number: doc: "Produces the distance from the origin to the given point." where: distance-to-origin(point(0, 0)) is 0 distance-to-origin(point(3, 4)) is 5 distance-to-origin(point(-3, -4)) is 5 end
Write Down a Template for the Data Definition
This is a new step beyond what’s needed for primitive data. When values get more structure, it’s useful to have a structure to program them with. Functions that work over Points will need access to the x and y fields. For breaking apart instances of data defined by data, Pyret provides an expression called cases. Here’s what a cases expression looks like over one of our example Points:
cases (Point) point(0, 1): | point(x, y) => x + y end => 0 + 1
You may be wondering why this is called cases when there’s only one case here. A more general version will be shown soon. The cases expression matches the constructor of the given value (in this case, point(0, 1)) with the case (a case is everything after a |) that has the same constructor name. Then, the fields of the value are substituted for the names in the case pattern (in this case, x and y), and the whole cases expression reduces to the body of that case.
The annotation in parentheses, (Point), checks that the value you provide is correct, and dictates which cases make sense to use—
only those from the data definition. Whenever we work with Points (or other compound data), a cases expression is a good way to get at the data inside it in a principled way. Most functions over Points will have this structure:
#| fun point-function(p :: Point) -> ???: cases (Point) p: | point(x, y) => ... x ... ... y ... end end |#
That is, we don’t know what it will produce, but the function will consume a point, use cases to inspect the point value, and then do something with the x and y fields. The precise operation isn’t clear until we look at the particular function we’re trying to implement, but one or both of those values will almost certainly be involved.
So, the next step is to copy the template into the definition we’re building:
fun distance-to-origin(p :: Point) -> Number: doc: "Produces the distance from the origin to the given point." cases (Point) p: | point(x, y) => ... x ... ... y ... end where: distance-to-origin(point(0, 0)) is 0 distance-to-origin(point(3, 4)) is 5 distance-to-origin(point(-3, -4)) is 5 end
Now to fill in the function body, we just need to figure out what the ... parts should be.
Fill in the Function Body
With a solid understanding of the function, we can now fill in the body with an implementation that gives the answers our tests expect. Here, we just use the standard definition of distance, computed by Pyret’s num-sqrt function and the * and + operators.
fun distance-to-origin(p :: Point) -> Number: doc: "Produces the distance from the origin to the given point." cases (Point) p: | point(x, y) => num-sqrt((x * x) + (y * x)) end where: distance-to-origin(point(0, 0)) is 0 distance-to-origin(point(3, 4)) is 5 distance-to-origin(point(-3, -4)) is 5 end
Run the Tests and Fix Mistakes
As before, we run the tests to figure out any mistakes we may have made, both in our understanding of the examples and in the implementation itself.
Do Now!Run the example above and fix any mistakes.
2.5.1 Functions Over Mixed, Related Data
Data definitions can define more than one kind of constructor: Point just has one (point). We could extend the definition to handle polar points:
data Point: | point(x :: Number, y :: Number) | polar(theta :: Number, magnitude :: Number) end
Now, the use of cases is a little more obvious, in distinguishing between these multiple constructors, which we often refer to as variants:
cases (Point) polar(0, 5): | point(x, y) => x | polar(theta, magnitude) => magnitude end => 5
cases (Point) point(1, 2): | point(x, y) => x | polar(theta, magnitude) => magnitude end => 1
2.5.2 More Refined Comparisons
Sometimes, a direct comparison via is isn’t enough for testing. We saw raises in the last section for testing errors. However, when doing some computations, especially involving math with approximations, we want to ask a different question. For example, consider these tests for distance-to-origin:
check: distance-to-origin(point(1, 1)) is ??? end
What can we check here? Typing this into the REPL, we can find that the answer prints as 1.4142135623730951. That’s an approximation of the real answer, which Pyret cannot represent exactly. But it’s hard to know that this precise answer, to this decimal place, and no more, is the one we should expect up front, and thinking through the answers is supposed to be the first thing we do!
Since we know we’re getting an approximation, we can really only check that the answer is roughly correct, not exactly correct. If we can check that the answer to distance-to-origin(point(1, 1)) is around, say, 1.41, and can do the same for some similar cases, that’s probably good enough for many applications, and for our purposes here. If we were calculating orbital dynamics, we might demand higher precision, but note that we’d still need to pick a cutoff! Testing for inexact results is a necessary task.
Let’s first define what we mean by “around” with one of the most precise ways we can, a function:
fun around(actual :: Number, expected :: Number) -> Boolean: doc: "Return whether actual is within 0.01 of expected" num-abs(actual - expected) < 0.01 where: around(5, 5.01) is true around(5.01, 5) is true around(5.02, 5) is false around(num-sqrt(2), 1.41) is true end
The is form now helps us out. There is special syntax for supplying a user-defined function to use to compare the two values, instead of just checking if they are equal:
check: 5 is%(around) 5.01 num-sqrt(2) is%(around) 1.41 distance-to-origin(point(1, 1)) is%(around) 1.41 end
Adding %(something) after is changes the behavior of is. Normally, it would compare the left and right values for equality. If something is provided with %, however, it instead passes the left and right values to the provided function (in this example around). If the provided function produces true, the test passes, if it produces false, the test fails. This gives us the control we need to test functions with predictable approximate results.
Extend the definition of distance-to-origin to include polar points.
This might save you a Google search: polar conversions. Use the design recipe to write x-component and y-component, which return the x and y Cartesian parts of the point (which you would need, for example, if you were plotting them on a graph). Read about num-sin and other functions you’ll need at the Pyret number documentation.
Write a data definition called Pay for pay types that includes both hourly employees, whose pay type includes an hourly rate, and salaried employees, whose pay type includes a total salary for the year. Use the design recipe to write a function called expected-weekly-wages that takes a Pay, and returns the expected weekly salary: the expected weekly salary for an hourly employee assumes they work 40 hours, and the expected weekly salary for a salaried employee is 1/52 of their salary.
2.6 Recipe for Recursive Data
Sometimes, a data definition has a piece that refers back to itself. For example, a linked list of numbers:
data NumList: | nl-empty | nl-link(first :: Number, rest :: NumList) end
Moving on to defining examples, we can talk about empty lists:
nl-empty
We can represent short lists, like a sequence of two 4’s:
nl-link(4, nl-link(4, nl-empty))
Since these are created with constructors from data, we can use cases with them:
cases (NumList) nl-empty: | nl-empty => "empty!" | nl-link(first, rest) => "not empty" end => "empty!"
cases (NumList) nl-link(1, nl-link(2, nl-empty)): | nl-empty => "empty!" | nl-link(first, rest) => first end => 1
This describes a qualitatively different kind of data than something like a Point, which is only made up of other kinds of simple data. In fact, in contrast to something like a Point, the size of NumLists is unbounded or arbitrarily-sized. Given a NumList, there is an easy way to make a new, larger one: just use nl-link. So, we need to consider larger lists:
nl-link(1, nl-link(2, nl-link(3, nl-link(4, nl-link(5, nl-link(6, nl-link(7, nl-link(8, nl-empty))))
Let’s try to write a function contains-3, which returns true if the NumList contains the value 3, and false otherwise.
First, our header:
fun contains-3(nl :: NumList) -> Boolean: doc: "Produces true if the list contains 3, false otherwise" end
Next, some tests:
fun contains-3(nl :: NumList) -> Boolean: doc: "Produces true if the list contains 3, false otherwise" where: contains-3(nl-empty) is false contains-3(nl-link(3, nl-empty)) is true contains-3(nl-link(1, nl-link(3, nl-empty))) is true contains-3(nl-link(1, nl-link(2, nl-link(3, nl-link(4, nl-empty))))) is true contains-3(nl-link(1, nl-link(2, nl-link(5, nl-link(4, nl-empty))))) is false end
Next, we need to fill in the body with the template for a function over NumLists. We can start with the analogous template using cases we had before:
fun contains-3(nl :: NumList) -> Boolean: doc: "Produces true if the list contains 3, false otherwise" cases (NumList) nl: | nl-empty => ... | nl-link(first, rest) => ... first ... ... rest ... end end
An empty list doesn’t contain the number 3, surely, so the answer must be false in the nl-empty case. In the nl-link case, if the first element is 3, we’ve successfully answered the question. That only leaves the case where the argument is an nl-link and the first element does not equal 3:
fun contains-3(nl :: NumList) -> Boolean: cases (NumList) nl: | nl-empty => false | nl-link(first, rest) => if first == 3: true else: # handle rest here end end end
Since we know rest is a NumList (based on the data definition), we can use a cases expression to work with it. This is sort of like filling in a part of the template again:
fun contains-3(nl :: NumList) -> Boolean: cases (NumList) nl: | nl-empty => false | nl-link(first, rest) => if first == 3: true else: cases (NumList) rest: | nl-empty => ... | nl-link(first-of-rest, rest-of-rest) => ... first-of-rest ... ... rest-of-rest ... end end end end
If the rest was empty, then we haven’t found 3 (just like when we checked the original argument, nl). If the rest was a nl-link, then we need to check if the first thing in the rest of the list is 3 or not:
fun contains-3(nl :: NumList) -> Boolean: cases (NumList) nl: | nl-empty => false | nl-link(first, rest) => if first == 3: true else: cases (NumList) rest: | nl-empty => false | nl-link(first-of-rest, rest-of-rest) => if first-of-rest == 3: true else: # fill in here ... end end end end end
Since rest-of-rest is a NumList, we can fill in a cases for it again:
fun contains-3(nl :: NumList) -> Boolean: cases (NumList) nl: | nl-empty => false | nl-link(first, rest) => if first == 3: true else: cases (NumList) rest: | nl-empty => false | nl-link(first-of-rest, rest-of-rest) => if first-of-rest == 3: true else: cases (NumList) rest-of-rest: | nl-empty => ... | nl-link(first-of-rest-of-rest, rest-of-rest-of-rest) => ... first-of-rest-of-rest ... ... rest-of-rest-of-rest ... end end end end end end
See where this is going? Not anywhere good. We can copy this cases expression as many times as we want, but we can never answer the question for a list that is just one element longer than the number of times we copy the code.
So what to do? We tried this approach of using another copy of cases based on the observation that rest is a NumList, and cases provides a meaningful way to break apart a NumList; in fact, it’s what the recipe seems to lead to naturally.
Let’s go back to the step where the problem began, after filling in the template with the first check for 3:
fun contains-3(nl :: NumList) -> Boolean: cases (NumList) nl: | nl-empty => false | nl-link(first, rest) => if first == 3: true else: # what to do with rest? end end end
We need a way to compute whether or not the value 3 is contained in rest. Looking back at the data definition, we see that rest is a perfectly valid NumList, simply by the definition of nl-link. And we have a function (or, most of one) whose job is to figure out if a NumList contains 3 or not: contains-3. That ought to be something we can call with rest as an argument, and get back the value we want:
fun contains-3(nl :: NumList) -> Boolean: cases (NumList) nl: | nl-empty => false | nl-link(first, rest) => if first == 3: true else: contains-3(rest) end end end
And lo and behold, all of the tests defined above pass. It’s useful to step through what’s happening when this function is called. Let’s look at an example:
contains-3(nl-link(1, nl-link(3, nl-empty)))
First, we substitute the argument value in place of nl everywhere it appears; that’s just the usual rule for function calls.
=> cases (NumList) nl-link(1, nl-link(3, nl-empty)): | nl-empty => false | nl-link(first, rest) => if first == 3: true else: contains-3(rest) end end
Next, we find the case that matches the constructor nl-link, and substitute the corresponding pieces of the nl-link value for the first and rest identifiers.
=> if 1 == 3: true else: contains-3(nl-link(3, nl-empty)) end
Since 1 isn’t 3, the comparison evaluates to false, and this whole expression evaluates to the contents of the else branch.
=> if false: true else: contains-3(nl-link(3, nl-empty)) end => contains-3(nl-link(3, nl-empty))
This is another function call, so we substitute the value nl-link(3, nl-empty), which was the rest field of the original input, into the body of contains-3 this time.
=> cases (NumList) nl-link(3, nl-empty): | nl-empty => false | nl-link(first, rest) => if first == 3: true else: contains-3(rest) end end
Again, we substitute into the nl-link branch.
=> if 3 == 3: true else: contains-3(nl-empty) end
This time, since 3 is 3, we take the first branch of the if expression, and the whole original call evaluates to true.
=> if true: true else: contains-3(nl-empty) end => true
An interesting feature of this trace through the evaluation is that we reached the expression contains-3(nl-link(3, nl-empty)), which is a normal call to contains-3; it could even be a test case on its own. The implementation works by doing something (checking for equality with 3) with the non-recursive parts of the datum, and combining that result with the result of the same operation (contains-3) on the recursive part of the datum. This idea of doing recursion with the same function on self-recursive parts of the datatype lets us extend our template to handle recursive positions.
The simple design recipe dictated this as the template:
#| fun num-list-fun(nl :: NumList) -> ???: cases (NumList) nl: | nl-empty => ... | nl-link(first, rest) => ... first ... ... rest ... end end |#
However, this template doesn’t give much guidance with what to do with the rest field. We will extend the template with the following rule: each self-recursive position in the data definition becomes a self-recursive function call in the template. So the new template looks like:
#| fun num-list-fun(nl :: NumList) -> ???: cases (NumList) nl: | nl-empty => ... | nl-link(first, rest) => ... first ... ... num-list-fun(rest) ... end end |#
To handle recursive data, the design recipe just needs to be modified to have this extended template. When you see a recursive data definition (of which there will be many when programming in Pyret), you should naturally start thinking about what the recursive calls will return and how to combine their results with the other, non-recursive pieces of the datatype.
Use the design recipe to write a function contains-n that takes a NumList and a Number, and returns whether that number is in the NumList.
Use the design recipe to write a function sum that takes a NumList, and returns the sum of all the numbers in it. The sum of the empty list is 0.
Use the design recipe to write a function remove-3 that takes a NumList, and returns a new NumList with any 3’s removed. The remaining elements should all be in the list in the same order they were in the input.
Write a data definition called NumListList that represents a list of NumLists, and use the design recipe to write a function sum-of-lists that takes a NumListList and produces a NumList containing the sums of the sub-lists.
2.7 Parameterizing Data Definitions
If we wanted to define a list of Strings in the same style we defined a list of numbers, we might write:
data StringList: | sl-empty | sl-link(first :: String, rest :: StringList) end
Similarly, the exercises from Recipe for Recursive Data had you define a representation of lists of NumLists. It would be straightfoward, if tedious, to do the same for lists of Booleans, lists of Pay objects, and so on.
Each of these forms has the same structure:
data ??List: | xx-empty | xx-link(first :: ??, rest :: ??List) end
Where the ?? and xx are filled in with application-specific names like Num and nl, or String and sl. It would be useful to not require a new definition for each kind of data; repeating ourselves is generally good to avoid in programming.
Pyret has special syntax for this kind of data definition:
data AList<a>: | a-empty | a-link(first :: a, rest :: AList<a>) end
We say that the header, data AList<a>, creates a data definition that is parametric or polymorphic over a [to learn more about this, see Parametric Polymorphism]. At the time of defining AList, we have not yet decided what kind of elements it will hold, and defer that decision until we create a particular AList whose contents we know. At that point, we can instantiate different kinds of lists by using <> to fill in an appropriate annotation for the contents of a particular AList:
ls :: AList<String> = a-link("a", a-link("b", a-link("c", a-empty))) lls :: AList<AList<String>> = a-link(a-link("a", a-empty), a-link(a-link("b", a-link("c", a-empty)), a-empty)) ln :: AList<Number> = a-link(1, a-link(2, a-empty)) lp :: AList<Point> = a-link(point(1, 2), a-link(polar(3, 45), a-empty))
We can call types like AList<Number> instantiations of the polymorphic AList type. We can rewrite the NumList functions we defined before in terms of ALists, like contains-3:
fun contains-3(nl :: AList<Number>) -> Boolean: doc: "Returns true if 3 is in the list, false otherwise" cases (AList<Number>) nl: | a-empty => false | a-link(first, rest) => if first == 3: true else: contains-3(rest) end end where: contains-3(a-empty) is false contains-3(a-link(3, a-empty)) is true contains-3(a-link(1, a-link(3, a-empty))) is true contains-3(a-link(1, a-link(2, a-link(3, a-link(4, a-empty))))) is true contains-3(a-link(1, a-link(2, a-link(5, a-link(4, a-empty))))) is false end
The only difference is the use of AList<Number> instead of NumList in the header and the cases expression (and the renaming of nl- to a-).
Let’s consider another example; calculating the length of a list of numbers. We start with the header as always:
fun length(nl :: AList<Number>) -> Number: doc: "Returns the number of elements in the list." end
Next we add examples:
fun length(nl :: AList<Number>) -> Number: doc: "Returns the number of elements in the list." where: length(a-empty) is 0 length(a-link(8, a-empty)) is 1 length(a-link(22, a-link(43, a-empty))) is 1 end
Next, we add the template:
fun length(nl :: AList<Number>) -> Number: doc: "Returns the number of elements in the list." cases (AList<Number>) nl: | a-empty => ... | a-link(first, rest) => ... first ... ... length(rest) ... end where: length(a-empty) is 0 length(a-link(8, a-empty)) is 1 length(a-link(22, a-link(43, a-empty))) is 2 end
The length of an empty list is 0, and the length of a link is one larger than the length of the rest of the list:
fun length(nl :: AList<Number>) -> Number: doc: "Returns the number of elements in the list." cases (AList<Number>) nl: | a-empty => 0 | a-link(first, rest) => 1 + length(rest) end where: length(a-empty) is 0 length(a-link(8, a-empty)) is 1 length(a-link(22, a-link(43, a-empty))) is 2 end
This should be a pretty easy exercise to understand by now. What would happen if we wanted to write the same definition for AList<String>? It would look like this:
fun length(sl :: AList<String>) -> Number: doc: "Returns the number of elements in the list." cases (AList<String>) sl: | a-empty => 0 | a-link(first, rest) => 1 + length(rest) end end
What about a AList<AList<Point>>?
fun length(pll :: AList<AList<Point>>) -> Number: doc: "Returns the number of elements in the list." cases (AList<AList<Point>>) pll: | a-empty => 0 | a-link(first, rest) => 1 + length(rest) end end
Aside from the annotations and the name of the argument, these functions are exactly the same! Mirroring the way we unified the various ??List data definitions, we can lift out the common part of the type:
fun length<a>(al :: AList<a>) -> Number: doc: "Returns the number of elements in the list." cases (AList<a>) al: | a-empty => 0 | a-link(first, rest) => 1 + length(rest) end end
This works (and makes sense) for all the different kinds of examples:
check: length(a-empty) is 0 length(a-link(8, a-empty)) is 1 length(a-link(22, a-link(43, a-empty))) is 2 length(a-empty) is 0 length(a-link("a", a-empty)) is 1 length(a-link("a", a-link(43, a-empty))) is 2 length(a-empty) is 0 length(a-link(a-link(point(1, 2), a-empty), a-empty)) is 1 end
There’s (much) more to talk about in actually type-checking parametric types later in Parametric Polymorphism. We call length a parametric or polymorphic function, just as we called AList a parametric data definition. It means that the function can work over all possible instantiations of AList. We’ll talk about more instances of interesting functions like this in Abstracting Common Parts of List Functions.
2.7.1 List and the list Abbreviation
Lists in this style are so useful that they are built into Pyret. The definition in Pyret’s library is:
data List<a>: | empty | link(first :: a, rest :: List<a>) end
The names List, link, and empty are all available anywhere in Pyret (to avoid confusion, Pyret will also stop any other programs from redefining them).
Since lists are constructed often, and writing nested link and empty calls is tedious, Pyret supports a special syntax for constructing lists more concisely:
check: [list: 1, 2] is link(1, link(2, empty)) [list:] is empty [list: [list:], [list: 1]] is link(empty, link(link(1, empty), empty)) end
We’ll switch to using this syntax for lists for the rest of the notes and examples, and to using the built-in List with problem-appropriate instantiations.
2.8 Abstracting Common Parts of List Functions
In Abstracting Common Parts, we saw how to pull out common parts of functions over primitive data. What happens if we apply the same rules to functions over lists?
Let’s consider two functions over List<Number>s, sqr-nums and abs-nums:
fun sqr-nums(nl :: List<Number>) -> List<Number>: doc: "Return a list with the squares of the given numbers" cases (List<Number>) nl: | empty => empty | link(first, rest) => link(num-sqr(first), sqr-nums(rest)) end end fun abs-nums(nl :: List<Number>) -> List<Number>: doc: "Return a list with the absolute values of the given numbers" cases (List<Number>) nl: | empty => empty | link(first, rest) => link(num-abs(first), abs-nums(rest)) end end
We can identify the common parts: the num-abs function and num-sqr functions appear in the same spot. By following the rules of extraction, we add a new parameter name at the beginning, replace the common part with that identifier in the function body, and put that argument in the beginning of the argument lists for each call.
fun do-to-nums(operation :: ???, nl :: List<Number>) -> List<Number>: doc: "Return a new list made of the elements of nl with operation done to them" cases (List<Number>) nl: | empty => empty | link(first, rest) => # replaced num-abs/num-sqr with operation in the line below, and added it # in the recursive call to do-to-nums, according to the rules of # extracting common parts link(operation(first), do-to-nums(operation, rest)) end end
What about the annotation for operation? What goes there?
To describe operation, we’re going to introduce a new kind of annotation, the function or arrow annotation. Its syntax is
(Ann1, Ann2, ..., AnnN -> AnnR)
where Ann1 through AnnN are annotations representing arguments to a function, and AnnR is an annotation representing the return type of a function. So we can write the annotation for num-sqr and num-abs as:
(Number -> Number)
That is, they are both functions that consume and produce Numbers. We can use one to describe the operation argument of do-to-nums:
fun do-to-nums(operation :: (Number -> Number), nl :: List<Number>) -> List<Number>: doc: "Return a list containing operation applied to each element of nl" cases (List<Number>) nl: | empty => empty | link(first, rest) => link(operation(first), do-to-nums(operation, rest)) end where: do-to-nums(num-sqr, [list:]) is [list:] do-to-nums(num-sqr, [list: 1, 2]) is [list: 1, 4] do-to-nums(num-abs, [list:]) is [list:] do-to-nums(num-abs, [list: -1, 1]) is [list: 1, 1] end
In the examples for do-to-nums, we see that, following the rules from extracting common parts, we provide num-abs and num-sqr as arguments. We can run the program and its examples and see that it works.
In Pyret (and many other languages), all functions are values, just like 5 or point(1, 2) are values. This goes for user-defined functions, too, so we can use do-to-nums with, say, an hours-to-wages function:
fun hours-to-wages(hours :: Number) -> Number: if hours <= 40: hours * 10 else if hours > 40: (40 * 10) + ((hours - 40) * (10 * 1.5)) end end check: do-to-nums(hours-to-wages, [list: 40, 39, 41]) is [list: 400, 390, 415] end
It’s useful to see how this impacts the evaluation of do-to-nums for a small example:
do-to-nums(hours-to-wages, link(41, link(39, empty)))
We aren’t calling hours-to-wages yet, so no substitution happens. The function value (which we’ll represent with its name) is simply passed in as a value to the body of do-to-nums.
=> cases (List<Number>) link(41, link(39, empty)): | empty => empty | link(first, rest) => link(hours-to-wages(first), do-to-nums(hours-to-wages, rest)) end
=> link(hours-to-wages(41), do-to-nums(hours-to-wages, link(39, empty)))
The body of hours-to-wages is evaluated next, with 41 substituted for hours.
=> link( if 41 <= 40: 41 * 10 else if 41 > 40: (40 * 10) + ((41 - 40) * (10 * 1.5)) end, do-to-nums(hours-to-wages, link(39, empty)))
Now we evaluate the if and perform arithmetic.
=> link( 415, do-to-nums(hours-to-wages, link(39, empty)))
Now we have another call to do-to-nums. Note that we are still processing the argument list of the call to link. We’re done processing the first element, but we’re processing the rest in a place where it will become the rest field of the new list that’s being created.
=> link( 415, cases (List<Number>) link(39, empty): | empty => empty | link(first, rest) => link(hours-to-wages(first), do-to-nums(hours-to-wages, rest)) end)
=> link( 415, link( hours-to-wages(39), do-to-nums(hours-to-wages, empty)))
The hours-to-wages call again happens, this time on 39.
=> link( 415, link( if 39 <= 40: 39 * 10 else if 39 > 40: (40 * 10) + ((39 - 40) * (10 * 1.5)) end, do-to-nums(hours-to-wages, empty)))
More arithmetic.
=> link( 415, link( 390, do-to-nums(hours-to-wages, empty)))
And to finish, we get to the base case of empty, and the list traversal is completed.
=> link( 415, link( 390, cases (List<Number>) empty: | empty => empty | link(first, rest) => link(hours-to-wages(first), do-to-nums(hours-to-wages, rest)) end))
=> link( 415, link( 390, empty))
When we pass functions as values, we will continue to represent them with just their name in evaluation sequences. One thing to keep in mind is that a function’s body is not evaluated until a function is called, and when the call occurs happens irrespective of when the function is passed as an argument. In fact, this must be the case: consider hours-to-wages above. Its body cannot be evaluated without supplying a value for hours, which happens only when the function is called. You can learn more about how functions are handled later [Adding Functions to the Language], but for now, other than this rule, you can treat functions like any other value we’ve seen so far.
Functions as values are also called first-class functions, and when they return a function, they are called higher-order functions. Lots of languages have first-class and higher-order functions, from Python and JavaScript to Ocaml and Haskell, and they have numerous uses, many of which we’ll study. (For some elegant and perhaps surprising ones, see [Functions as Data].)
The do-to-nums function is a pretty nice abstraction: we can now apply a function to all the elements of a list of numbers, and get back a new list with the outputs.
But we can do one better. Let’s look at what do-to-strings, a version of do-to-nums for Strings, would look like:
fun do-to-strings(operation :: (String -> String), sl :: List<String>) -> List<String>: doc: "Return a list containing operation applied to each element of ns" cases (List<String>) sl: | empty => empty | link(first, rest) => link(operation(first), do-to-strings(operation, rest)) end end
Like when we saw similar versions of length for different types, do-to-strings and do-to-nums only differ in the declared types: String in place of Number everywhere. This suggests that we can use a parametric type again:
fun do-to-a<a>(operation :: (a -> a), al :: List<a>) -> List<a>:
If we fill in the definition of do-to-a, we could redefine:
fun do-to-nums(operation :: (Number -> Number), nl :: List<Number>) -> List<Number>: do-to-a(operation, nl) end fun do-to-strs(operation :: (String -> String), sl :: List<String>) -> List<String>: do-to-a(operation, sl) end
But before we declare ourselves done, there’s one last part of this definition worth generalizing. Do you see it? What if we wanted to write nums-to-strings, which takes a list of Numbers and produces a list of Strings:
fun nums-to-strings(nums :: List<Number>) -> List<String>: ... end
Can we use do-to-a to do it? The definition would look like:
fun nums-to-strings(nl :: List<Number>) -> List<String>: do-to-a(num-tostring, nl) end
We could actually run this version of nums-to-strings without error: Pyret doesn’t currently check that we don’t mis-use parametric types. That doesn’t change the fact that we would be violating the stated contract of do-to-a, though.
But, this violates the contract of do-to-a, because do-to-a must return a list of the same type it consumes. In this example, it would produce a list of a different type. So there’s one final change to make to do-to-a to make it handle this last case:
fun turn-as-into-bs<a, b>(a-to-b :: (a -> b), al :: List<a>) -> List<b>: doc: "Return a the b's returned from a-to-b on each a in the input" cases (List<a>) al: | empty => empty | link(first, rest) => link(a-to-b(first), turn-as-into-bs(a-to-b, rest)) end end
This new function, turn-as-into-bs, is parameterized over two types, the type of the elements of the input list (a), and the type of the elements of the output list (b). With this final tweak, all of these examples now make sense with the same function:
check: turn-as-into-bs(num-sqr, [list:]) is [list:] turn-as-into-bs(num-sqr, [list: 1, 2]) is [list: 1, 4] turn-as-into-bs(num-abs, [list:]) is [list:] turn-as-into-bs(num-abs, [list: -1, 1]) is [list: 1, 1] turn-as-into-bs(string-toupper, [list: "a", "b", "c"]) is [list: "A", "B", "C"] turn-as-into-bs(string-tolower, [list: "A", "B", "C"]) is [list: "a", "b", "c"] turn-as-into-bs(num-tostring, [list: 1, 2]) is [list: "1", "2"] end
The function turn-as-into-bs is typically called map, and is a major building block in programming with functions, just like the for loop is a major building block in C++ and Java. It’s important enough that map is a default library function in many programming languages with higher-order functions.
2.9 Functions that Return Functions
We saw in Abstracting Common Parts of List Functions that the turn-as-into-bs function, more commonly known as map, covered a number of use cases for transforming lists. Recall that we can use it, for instance, to calculate the hourly wage for a list of numbers representing hours:
# This function is just as before fun hours-to-wages(hours :: Number) -> Number: doc: "Compute total wage from hours, accounting for overtime, at $10/hr base" if hours < 41: hours * 10 else if hours >= 41: (40 * 10) + ((hours - 40) * (10 * 1.5)) end end # Now we could use check: map(hours-to-wages, [list: 30, 40]) is [list: 300, 400] end
It’s easy to define versions of this that for hourly rates of 20, or 30, and use those with map. But we already saw that in Abstracting Common Parts, it’s a good idea to not write many copies of essentially the same function, when we can abstract out the common parts.
But, if we have a modified version of hours-to-wages that takes the hourly rate as an argument:
fun hours-to-wages-rated(hours :: Number, rate :: Number) -> Number:
How do we use this with map? Let’s say we have two lists of hours:
at-rate-10 = [list: 35, 40, 30] at-rate-20 = [list: 25, 45, 40]
How can we use map to calculate the first list’s wages at a rate of $10/hr, and the second list’s wages at $20/hr? If we just try calling it directly:
check: map(hours-to-wages-rated, at-rate-10) is [list: 350, 400, 300] map(hours-to-wages-rated, at-rate-20) is [list: 500, 950, 800] end
We get an arity mismatch error from inside the map function. Why? Recall that what the map function does is apply the function given to it as its first argument to each list element. So it tries to apply, in the first example:
hours-to-wages-rated(35)
This is clearly an error, since hours-to-wages-rated’s contract requires two arguments, and only one is provided here. What would solve the problem is a function that takes an hourly rate, and returns a function like hours-to-wages that only takes one argument, but keeps track of the given hourly rate.
Let’s first think about what the header to such a function would look like:
fun make-rated-hours-to-wages(rate :: Number) -> (Number -> Number):
That’s the type-based description of what we just said: make-rated-hours-to-wages takes a Number and returns a function that takes a Number and produces one. We can write some examples of what it would look like to use such a function:
fun make-rated-hours-to-wages(rate :: Number) -> (Number -> Number): doc: "Create a function that computes wages at the given rate" where: make-rated-hours-to-wages(10)(40) is 400 make-rated-hours-to-wages(10)(35) is 350 make-rated-hours-to-wages(20)(35) is 700 end
Note how the tests look different from anything we’ve seen: There is a function invocation followed by another, because make-rated-hours-to-wages returns a function, which we immediately call. Note also that this matches the return type in the contract, which specifies that the function should return a (Number -> Number) function. As further examples, the returned functions ought to be of the right shape to use with map:
check: map(make-rated-hours-to-wages(10), at-rate-10) is [list: 350, 400, 300] map(make-rated-hours-to-wages(20), at-rate-20) is [list: 500, 950, 800] end
lam is short for lambda, which is a traditional name for anonymous functions that comes from the λ-calculus. So how to fill in the body of make-rated-hours-to-wages? There are a few ways, here we’re going to add to our inventory of Pyret concepts. In Pyret, we can use the lam keyword to create a function value directly, anywhere an expression is allowed. A use of lam looks like a use of fun but without the name. The resulting value is just like a function created with fun; they can be called, passed to map, bound to identifiers, and so on:
check: f = lam(x :: Number) -> Number: x + 1 end f(1) is 2 f(2) is 3 map(lam(x): x + 1 end, [list: 1, 2, 3]) is [list: 2, 3, 4] map(f, [list: 1, 2, 3]) is [list: 2, 3, 4] end
So, we can make the body of make-rated-hours-to-wages be a single lam expression, that takes in a number of hours and does the call to hours-to-wages-rated with both arguments.
fun make-rated-hours-to-wages(rate :: Number) -> (Number -> Number): doc: "Create a function that computes wages at the given rate" lam(hours :: Number) -> Number: hours-to-wages-rated(hours, rate) end end
Let’s step through a substitution, which has an interesting new consequence:
make-rated-hours-to-wages(10)(30) => lam(hours :: Number) -> Number: hours-to-wages-rated(hours, 10) end(30) => hours-to-wages-rated(30, 10) => 300 # ... after some arithmetic
In a call to map, the lam value is not called immediately, and is passed through. As a reminder, here is the definition of map:
fun map<a, b>(a-to-b :: (a -> b), al :: List<a>) -> List<b>: doc: "Return a the b's returned from a-to-b on each a in the input" cases (List<a>) al: | empty => empty | link(first, rest) => link(a-to-b(first), map(a-to-b, rest)) end end
Let’s look at a call to map with an anonymous function created from make-rated-hours-to-wages:
map(make-rated-hours-to-wages(10), [list: 35, 40, 30]) => map(lam(hours): hours-to-wages(hours, 10) end, [list: 35, 40, 30])
=> cases (List<Number>) [list: 35, 40, 30]: | empty => empty | link(first, rest) => link((lam(hours): hours-to-wages(hours, 10) end(first)), map(lam(hours): hours-to-wages(hours, 10) end, rest)) end => link(lam(hours): hours-to-wages(hours, 10) end(35), map(lam(hours): hours-to-wages(hours, 10) end, [list: 40, 30]))
=> link(hours-to-wages(35, 10), map(lam(hours): hours-to-wages(hours, 10) end, [list: 40, 30])) => link(350, # skipping arithmetic again map(lam(hours): hours-to-wages(hours, 10) end, [list: 40, 30]))
=> link(350, cases (List<Number>) [list: 40, 30]: | empty => empty | link(first, rest) => link((lam(hours): hours-to-wages(hours, 10) end(first)), map(lam(hours): hours-to-wages(hours, 10) end, rest)) end)
Anonymous functions are a useful tool when working with functions like map, because they provide a way to manage multi-argument functions, and also can create functions to use in map without creating verbose new fun declarations.
As far as definitions go, we call lam expressions anonymous function expressions. When an anonymous function expression’s body has a reference to an identifier that isn’t one of its arguments, we say that it closes over that identifier. Similarly, when a value is substituted for a closed over identifier, we say that the value has been closed over. The resulting function that is closed over other values is called a closure.
Closures will come up many times in the rest of this book, and they are one of the first things we’ll learn to implement when we focus on the study of languages themselves.
2.10 More and Mutually Recursive Data
Linked lists are just one instance of recursive data. There are many other patterns of recursive data to consider, and they have interesting consequences for constructing functions over them.
2.10.1 Other Recursive Patterns
We’ll say much more about binary trees later. Binary trees are one kind of recursive data that have more than one recursive piece:
data BinTree<a>: | leaf | node(value :: a, left :: BinTree<a>, right :: BinTree<a>) end
Extending the recipe for this kind of recursive data is straightforward. We modify the template to include recursive calls for each recursive piece:
fun bin-tree-fun<a>(t :: BinTree<a>) -> ???: cases (BinTree<a>) t: | leaf => ... | node(value, left, right) => ... value ... ... bin-tree-fun(left) ... ... bin-tree-fun(right) ... end end
We can use this template to fill in, for example, tree-sum, which adds up the numbers in a BinTree<Number>:
fun tree-sum(t :: BinTree<Number>) -> Number: cases (BinTree<Number>) t: | leaf => 0 | node(value, left, right) => value + tree-sum(left) + tree-sum(right) end where: tree-sum(leaf) is 0 tree-sum(node(2, leaf, leaf)) is 2 tree-sum(node(3, node(4, leaf, leaf), leaf)) is 7 tree-sum(node(3, node(4, leaf, leaf), node(6, leaf, leaf))) is 13 end
What’s the time complexity of tree-inorder, assuming that + is linear time in the length of the left-hand list. Is there a better solution using an accumulator? Or, as another example, tree-inorder, which returns a List of the elements in order (note that + is suitable to use as a way to append lists):
fun tree-inorder<a>(t :: BinTree<a>) -> List<a>: cases (BinTree<a>) t: | leaf => empty | node(value, left, right) => tree-inorder(left) + [list: value] + tree-inorder(right) end end
This is a straightforward extension of handling recursive datatypes, but is useful to note on its own.
2.10.2 Mutually Recursive Data
The last section defined binary trees, but what about general trees with lists of children? In this case, we might define it as:
data TList<b>: | t-empty | t-link(first :: Tree<b>, rest :: TList) end data Tree<b>: | node(value :: b, children :: TList<b>) end
Note that the type variable b in TList differs from the one in List: it is parametrizing the kind of Tree the first field contains, not the type of the first field itself. What would it look like to define Tree with plain Lists for children?
Here, there is no direct recursion in the body of Tree: its variants of Tree make no reference back to itself. However, they do make reference to TList, which refers back to Tree in the first field of t-link. (We would notice the cycle if we started from TList, as well).
For mutually-recursive datatypes, we cannot consider one without the other: Any function that works over Trees may need to consider all the nuances of TLists (or List<Tree>s) as well.
This is true in any examples of the datatype (can you write an example, other than t-empty, that uses one without the other?), and in the code that we write. Let’s try to develop tree-sum with this definition, starting with a template:
fun tree-sum(t :: Tree<Number>) -> Number: doc: "Add up the elements of the tree" cases (Tree<Number>) t: | node(val, children) => ... val ... ... children ... end where: tree-sum(node(4, t-empty)) is 4 tree-sum(node(5, t-link(node(4, t-empty), t-link(node(6, t-empty), t-empty))) is 15 end
We want to add val (which we know to be a number), to the sum of the list of children. That’s a big enough job that it should be deferred to a helper, so:
fun list-sum(l :: TList<Number>) -> Number: doc: "Add up the elements of the list" end fun tree-sum(t :: Tree<Number>) -> Number: doc: "Add up the elements of the tree" cases (Tree<Number>) t: | node(val, children) => val + list-sum(children) end end
Now, we can insert the template for list-sum, which contains a recursive call for the rest field, and no guidance (yet) on the first field:
fun list-sum(l :: TList<Number>) -> Number: doc: "Add up the elements of the list" cases (TList<Number>) l: | t-empty => 0 | t-link(first, rest) => ... first ... ... list-sum(rest) ... end end fun tree-sum(t :: Tree<Number>) -> Number: doc: "Add up the elements of the tree" cases (Tree<Number>) t: | node(val, children) => val + list-sum(children) end end
If we follow the same rule we followed for tree-sum, we ought to be able to just call back into tree-sum for first:
fun list-sum(l :: TList<Number>) -> Number: doc: "Add up the elements of the list" cases (TList<Number>) l: | t-empty => 0 | t-link(first, rest) => tree-sum(first) + list-sum(rest) end end fun tree-sum(t :: Tree<Number>) -> Number: doc: "Add up the elements of the tree" cases (Tree<Number>) t: | node(val, children) => val + list-sum(children) end end
This completes the definition, and the tests defined above will now pass. As usual, it’s useful to step through an evaluation of an example to see the pattern in action:
tree-sum(node(5, t-link(node(4, t-empty), node(6, t-empty))))
=> cases (Tree<Number>) node(5, t-link(node(4, t-empty), t-link(node(6, t-empty), t-empty))): | node(val, children) => val + list-sum(children) end => 5 + list-sum(t-link(node(4, t-empty), node(6, t-empty))) => 5 + cases (TList<Number>) t-link(node(4, t-empty), t-link(node(6, t-empty), t-empty)): | t-empty => 0 | t-link(first, rest) => tree-sum(first) + list-sum(rest) end
=> 5 + (tree-sum(node(4, t-empty)) + list-sum(t-link(node(6, t-empty), t-empty))) => 5 + (cases (Tree<Number>) node(4, t-empty): | node(val, children) => val + list-sum(children) end + list-sum(t-link(node(6, t-empty), t-empty))) => 5 + ((4 + list-sum(t-empty)) + list-sum(t-link(node(6, t-empty), t-empty)))
Note that these calls are quite deeply nested; now another call to list-sum needs to be resolved before we can go on.
=> 5 + ((4 + cases (TList<Number>) t-empty: | t-empty => 0 | t-link(first, rest) => tree-sum(first) + list-sum(rest) end) + list-sum(t-link(node(6, t-empty), t-empty))) => 5 + ((4 + 0) + list-sum(t-link(node(6, t-empty), t-empty)))
Up to here corresponds to processing the first sub-tree of the input. The 4 corresponds to the tree-sum call in list-sum.
=> 5 + (4 + list-sum(t-link(node(6, t-empty), t-empty)))
Fill in the rest of the reduction down to 5 + (4 + 6), corresponding to the three value fields in the original tree.
We can extract some new rules from this: if we notice a cycle in the datatypes used in a data definition, we need to make sure that we handle the whole cycle when we think through examples, and through the template. When designing the template, we should consider a template for each datatype involved in the cycle. In this example, that would mean a template like:
fun tlist-fun<a>(l :: TList<a>) -> ???: cases (TList<a>) l: | t-empty => ... | t-link(first, rest) => # call to tree-fun, which will be handled below ... tree-fun(first) ... ... tlist-fun(rest) ... end end fun tree-fun<a>(t :: Tree<a>) -> a: cases (Tree<a>) t: | node(val, children) => ... val ... # call to tlist-fun, to handle children above ... tlist-fun(children) ... end end
This generalizes to mutual recursion between any number of data definitions. If you notice that your data definitions have a cycle of reference, be aware that you’ll need to work through examples and an implementation that handles all of them.
2.10.3 A Note on Parametric Data Definitions
The definition in the last section made up a new datatype, TList, for representing lists of trees. That was useful for an example, but most of the time, it would be preferable to just use the built-in List datatype rather than defining something totally new. So a Tree definition like this would make better use of Pyret’s builtins:
data Tree<a>: | node(value :: a, children :: List<Tree<a>>) end
The template wouldn’t change much; it would simply refer to List<Tree<a>> instead of TList<a>, and list-sum would use List<Tree<Number>> instead of TList<Number>. But we saw other functions that used Lists, and conceivably the lists that they work over could have sub-elements that are Trees. Does that mean that map, and length, need to be concerned with Trees? Hopefully the answer is no, because otherwise we’d need to rethink the whole discussion about map in Abstracting Common Parts of List Functions to account for Trees!
This brings up an important distinction between the data that a specific problem works with, and a single Pyret data declaration. Problems that work over Tree<Number>s are concerned with specifically List<Tree<Number>>s, not any other kind of list. When writing functions over List<Tree<Number>>s, it’s natural that we have to take Trees into account, and consider Tree-handling functions in the template. The data definition of a Tree problem specifies the type of data that the list works with, and it becomes part of the examples and the problem template.
However, other instantiations of List, like List<Number>, have no relationship with Trees, and those problems have no reason to include templates for Trees. The problem we’re trying to solve dictates the types of data to use, which may include some specific instantiations, and that guides us towards a template.
Finally, describing some problems’ data doesn’t require a fully instantiated datatype. For example, taking the length of a list, or calculating the number of nodes in a tree, work for any instantiation of a List or Tree. Since these problems can be solved without referring to particular instantiations, they don’t need to consider the contents at all.
2.11 Naming Intermediate Values
Sometimes, we perform a computation and want to use the result more than once. For example, in this implementation of max-in-list, we get the max value from the rest of the list in two places:
fun max-in-list(l :: List<Number>) -> Number: doc: "Return the largest number in the list" cases (List<Number>) l: | empty => 0 | link(first, rest) => if first > max-in-list(rest): first else: max-in-list(rest) end end where: max-in-list([list: 1, 5, 4]) is 5 max-in-list([list:]) is 0 end
The expression max-in-list(rest) appears twice in this example. To avoid duplicating it, we can give it a name, like this:
fun max-in-list(l :: List<Number>) -> Number: doc: "Return the largest number in the list" cases (List<Number>) l: | empty => 0 | link(first, rest) => max-rest = max-in-list(rest) if first > max-rest: first else: max-rest end end end
We call the expression max-rest = max-in-list(rest) an identifier binding or let binding expression. We need to add a new rule for reducing this kind of expression. It’s a little different than the rules we’ve seen so far, because it takes multiple expressions into account. When handling an identifier binding, we look at the binding expression itself and the expression(s) immediately after it in the same block of code. So in the example above, we’d be considering:
max-rest = max-in-list(rest) if first > max-rest: first else: max-rest end
An identifier binding evaluates by evaluating the expression to the right of the = sign, and then substituting the resulting value into the rest of the expressions after it. Before stepping through all of max-in-list, here’s a short example that has a similar structure:
x = 5 + 5 if x > 9 x else: x + 11 end => x = 10 if x > 9 x else: x + 11 end
This is where x in the rest of the expressions (the if expression) gets substituted with 10.
=> if 10 > 9: 10 else: 10 + 11 end => if true: 10 else: 10 + 11 end => 10
So, a full substitution for a call to max-in-list would look like:
max-in-list([list: 1, 5]) => cases (List<Number>) [list: 1, 5]: | empty => 0 | link(first, rest) => max-rest = max-in-list(rest) if first > max-rest: first else: max-rest end end => max-rest = max-in-list([list: 5]) if 1 > max-rest: 1 else: max-rest end
This step is interesting, because we have to process the max-in-list call to find the answer to store in max-rest.
=> max-rest = cases (List<Number>) [list: 5]: | empty => 0 | link(first, rest) => max-rest = max-in-list(rest) if first > max-rest: first else: max-rest end end if 1 > max-rest: 1 else: max-rest end
The block: ... end notation is used to explicit indicate that there is a sequence of expressions that will be evaluated just like any other. Note that we have two max-rests, each of which will be substituted into its own block.
=> max-rest = block: max-rest = max-in-list([list:]) if 5 > max-rest: 5 else: max-rest end end if 1 > max-rest: 1 else: max-rest end => max-rest = block: max-rest = cases (List<Number>) [list:]: | empty => 0 | link(first, rest) => if first > max-rest: first else: max-rest end end if 5 > max-rest: 5 else: max-rest end end if 1 > max-rest: 1 else: max-rest end
Now there is a value, 0, to substitute for the max-rest that is further to the right, so we can do the substitution for max-rest where it appears.
=> max-rest = block: max-rest = 0 if 5 > max-rest: 5 else: max-rest end end if 1 > max-rest: 1 else: max-rest end => max-rest = block: if 5 > 0: 5 else: 0 end end if 1 > max-rest: 1 else: max-rest end => max-rest = block: if true: 5 else: 0 end end if 1 > max-rest: 1 else: max-rest end
When a block: contains only a single value expression, it evaluates to that value.
=> max-rest = block: 5 end if 1 > max-rest: 1 else: max-rest end => max-rest = 5 if 1 > max-rest: 1 else: max-rest end
Now we can substitute for the first max-rest, and finish the if evaluation to get 5.
=> if 1 > 5: 1 else: 5 end => if false: 1 else: 5 end => 5
If there are multiple identifier bindings in a row, they are processed in order, and get substituted into the right-hand-sides of any other bindings that happen later in the same block:
=> x = 4 + 5 y = x + 10 z = y + x z - 3 => x = 9 y = x + 10 z = y + x z - 3 => y = 9 + 10 z = y + 9 z - 3 => y = 19 z = y + 9 z - 3 => z = 19 + 9 z - 3 => z = 28 z - 3 => 28 - 3 => 25
Standalone Bindings: A let-binding expression doesn’t make much sense on its own; in this binding, what expression can y be substituted into?
fun add-5(x :: Number) -> Number: y = x + 5 end
Pyret cannot do anything meaningful with the body of add-5, so Pyret reports an error upon running this program, that the block ended in a let-binding and thus cannot be evaluated. This version of add-5 would make sense:
fun add-5(x :: Number) -> Number: y = x + 5 y end
2.12 More to Learn
This chapter gives you a foundation for programming in Pyret, and is enough to get through all the programs through Halloween Analysis. There are several more features you’ll need to learn for later chapters, and they’ll be introduced as they come up, since the new features relate directly to the programming concepts taught in those chapters.
3 Interactive Games as Reactive Programs
In this tutorial we’re going to write a little interactive game. The game won’t be sophisticated, but it’ll have all the elements you need to build much richer games of your own.
Imagine we have an airplane coming in to land. It’s unfortunately trying to do so amidst a hot-air balloon festival, so it naturally wants to avoid colliding with any (moving) balloons. In addition, there is both land and water, and the airplane needs to alight on land. We might also equip it with limited amounts of fuel to complete its task. Here are some animations of the game:
http://world.cs.brown.edu/1/projects/flight-lander/v9-success.swf
The airplane comes in to land succcessfully.
http://world.cs.brown.edu/1/projects/flight-lander/v9-collide.swf
Uh oh—
the airplane collides with a balloon! http://world.cs.brown.edu/1/projects/flight-lander/v9-sink.swf
Uh oh—
the airplane lands in the water!
By the end, you will have written all the relevant portions of this program. Your program will: animate the airplane to move autonomously; detect keystrokes and adjust the airplane accordingly; have multiple moving balloons; detect collisions between the airplane and balloons; check for landing on water and land; and account for the use of fuel. Phew: that’s a lot going on! Therefore, we won’t write it all at once; instead, we’ll build it up bit-by-bit. But we’ll get there by the end.
3.1 About Reactive Animations
We are writing a program with two important interactive elements: it is an animation, meaning it gives the impression of motion, and it is reactive, meaning it responds to user input. Both of these can be challenging to program, but Pyret provides a simple mechanism that accommodates both and integrates well with other programming principles such as testing. We will learn about this as we go along.
The key to creating an animation is the Movie Principle. Even in
the most sophisticated movie you can watch, there is no motion
(indeed, the very term “movie”—
3.2 Preliminaries
import image as I import world as W
3.3 Version: Airplane Moving Across the Screen
We will start with the simplest version: one in which the airplane moves horizontally across the screen. Watch this video:
http://world.cs.brown.edu/1/projects/flight-lander/v1.swf
First, here’s an image of an airplane:Have fun finding your preferred airplane image! But don’t spend too long on it, because we’ve still got a lot of work to do.
http://world.cs.brown.edu/1/clipart/airplane-small.png
AIRPLANE-URL = "http://world.cs.brown.edu/1/clipart/airplane-small.png" AIRPLANE = I.image-url(AIRPLANE-URL)
Now look at the video again. Watch what happens at different points in time. What stays the same, and what changes? What’s common is the water and land, which stay the same. What changes is the (horizontal) position of the airplane.
The World State consists of everything that changes. Things that stay the same do not need to get recorded in the World State.
We can now define our first World State:
The World State is a number, representing the x-position of the airplane.
Observe something important above.
When we record a World State, we don’t capture only the type of the values, but also their intended meaning.
Ask to be notified of the passage of time.
As time passes, correspondingly update the World State.
Given an updated World State, produce the corresponding visual display.
3.3.1 Updating the World State
As we’ve noted, the airplane doesn’t actually “move”. Rather, we can ask Pyret to notify us every time a clock ticks ([REF]). If on each tick we place the airplane in an appropriately different position, and the ticks happen often enough, we will get the impression of motion.
AIRPLANE-X-MOVE = 10
check: move-airplane-x-on-tick(50) is 50 + AIRPLANE-X-MOVE move-airplane-x-on-tick(0) is 0 + AIRPLANE-X-MOVE move-airplane-x-on-tick(100) is 100 + AIRPLANE-X-MOVE end
fun move-airplane-x-on-tick(w): w + AIRPLANE-X-MOVE end
If you have prior experience programming animations and reactive programs, you will immediately notice an important difference: it’s easy to test parts of your program in Pyret!
3.3.2 Displaying the World State
WIDTH = 800 HEIGHT = 500 BASE-HEIGHT = 50 WATER-WIDTH = 500
BLANK-SCENE = I.empty-scene(WIDTH, HEIGHT) WATER = I.rectangle(WATER-WIDTH, BASE-HEIGHT, "solid", "blue") LAND = I.rectangle(WIDTH - WATER-WIDTH, BASE-HEIGHT, "solid", "brown") BASE = I.beside(WATER, LAND) BACKGROUND = I.place-image(BASE, WIDTH / 2, HEIGHT - (BASE-HEIGHT / 2), BLANK-SCENE)
The reason we divide by two when placing BASE is because Pyret puts the middle of the image at the given location. Remove the division and see what happens to the resulting image.
I.place-image(AIRPLANE, # some x position, 50, BACKGROUND)
fun place-airplane-x(w): I.place-image(AIRPLANE, w, 50, BACKGROUND) end
3.3.3 Observing Time (and Combining the Pieces)
W.big-bang(0, [list: W.on-tick(move-airplane-x-on-tick), W.to-draw(place-airplane-x)])
That’s it! We’ve created our first animation. Now that we’ve gotten all the preliminaries out of the way, we can go about enhancing it.
If you want the airplane to appear to move faster, what can you change?
3.4 Version: Wrapping Around
When you run the preceding program, you’ll notice that after a while, the airplane just disappears. This is because it has gone past the right edge of the screen; it is still being “drawn”, but in a location that you cannot see. That’s not very useful!Also, after a long while you might get an error because the computer is being asked to draw the airplane at a location beyond what the graphics system can manage. Instead, when the airplane is about to go past the right edge of the screen, we’d like it to reappear on the left by a corresponding amount: “wrapping around”, as it were.
Here’s the video for this version:
http://world.cs.brown.edu/1/projects/flight-lander/v2.swf
Let’s think about what we need to change. Clearly, we need to modify the function that updates the airplane’s location, since this must now reflect our decision to wrap around. But the task of how to draw the airplane doesn’t need to change at all! Similarly, the definition of the World State does not need to change, either.
fun move-airplane-wrapping-x-on-tick(x): num-modulo(x + AIRPLANE-X-MOVE, WIDTH) end
fun move-airplane-wrapping-x-on-tick(x): num-modulo(move-airplane-x-on-tick(x), WIDTH) end
Well, that’s a proposed re-definition. Be sure to test this function thoroughly: it’s tricker than you might think! Have you thought about all the cases? For instance, what happens if the airplane is half-way off the right edge of the screen?
It is possible to leave move-airplane-x-on-tick unchanged and perform the modular arithmetic in place-airplane-x instead. We choose not to do that for the following reason. In this version, we really do think of the airplane as circling around and starting again from the left edge (imagine the world is a cylinder...). Thus, the airplane’s x-position really does keep going back down. If instead we allowed the World State to increase monotonically, then it would really be representing the total distance traveled, contradicting our definition of the World State.
3.5 Version: Descending
Of course, we need our airplane to move in more than just one dimension: to get to the final game, it must both ascend and descend as well. For now, we’ll focus on the simplest version of this, which is a airplane that continuously descends. Here’s a video:
http://world.cs.brown.edu/1/projects/flight-lander/v3.swf
Let’s again consider individual frames of this video. What’s staying the same? Once again, the water and the land. What’s changing? The position of the airplane. But, whereas before the airplane moved only in the x-dimension, now it moves in both x and y. That immediately tells us that our definition of the World State is inadequate, and must be modified.
data Posn: | posn(x, y) end
The World State is a posn, representing the x-position and y-position of the airplane on the screen.
3.5.1 Moving the Airplane
AIRPLANE-Y-MOVE = 3
check: move-airplane-xy-on-tick(posn(10, 10)) is posn(20, 13) end
check: p = posn(10, 10) move-airplane-xy-on-tick(p) is posn(move-airplane-wrapping-x-on-tick(p.x), move-airplane-y-on-tick(p.y)) end
Which method of writing tests is better? Both! They each offer different advantages:
The former method has the benefit of being very concrete: there’s no question what you expect, and it demonstrates that you really can compute the desired answer from first principles.
The latter method has the advantage that, if you change the constants in your program (such as the rate of descent), seemingly correct tests do not suddenly fail. That is, this form of testing is more about the relationships between things rather than their precise values.
There is one more choice available, which often combines the best of both worlds: write the answer as concretely as possible (the former style), but using constants to compute the answer (the advantage of the latter style). For instance:check: p = posn(10, 10) move-airplane-xy-on-tick(p) is posn(num-modulo(p.x + AIRPLANE-X-MOVE, WIDTH), p.y + AIRPLANE-Y-MOVE) end
Before you proceed, have you written enough test cases? Are you sure? Have you, for instance, tested what should happen when the airplane is near the edge of the screen in either or both dimensions? We thought not—
go back and write more tests before you proceed!
fun move-airplane-xy-on-tick(w): posn(move-airplane-wrapping-x-on-tick(w.x), move-airplane-y-on-tick(w.y)) end
fun move-airplane-y-on-tick(y): y + AIRPLANE-Y-MOVE end
3.5.2 Drawing the Scene
We have to also examine and update place-airplane-x. Our earlier definition placed the airplane at an arbitrary y-coordinate; now we have to take the y-coordinate from the World State: fun place-airplane-xy(w): I.place-image(AIRPLANE, w.x, w.y, BACKGROUND) end Notice that we can’t really reuse the previous definition because it hard-coded the y-position, which we must now make a parameter.
3.5.3 Finishing Touches
INIT-POS = posn(0, 0) W.big-bang(INIT-POS, [list: W.on-tick(move-airplane-xy-on-tick), W.to-draw(place-airplane-xy)])
It’s a little unsatisfactory to have the airplane truncated by the screen. You can use I.image-width and I.image-height to obtain the dimensions of an image, such as the airplane. Use these to ensure the airplane fits entirely within the screen for the initial scene, and similarly in move-airplane-xy-on-tick.
3.6 Version: Responding to Keystrokes
Now that we have the airplane descending, there’s no reason it can’t ascend as well. Here’s a video:
http://world.cs.brown.edu/1/projects/flight-lander/v4.swf
We’ll use the keyboard to control its motion: specifically, the up-key will make it move up, while the down-key will make it descend even faster. This is easy to support using what we already know: we just need to provide one more handler using W.on-key. This handler takes two arguments: the first is the current value of the world, while the second is a representation of which key was pressed. For the purposes of this program, the only key values we care about are "up" and "down".
KEY-DISTANCE = 10
fun alter-airplane-y-on-key(w, key): ask: | key == "up" then: posn(w.x, w.y - KEY-DISTANCE) | key == "down" then: posn(w.x, w.y + KEY-DISTANCE) | otherwise: w end end
Why does this function definition contain| otherwise: w
as its last condition?
Notice that if we receive any key other than the two we expect, we leave the World State as it was; from the user’s perspective, this has the effect of just ignoring the keystroke. Remove this last clause, press some other key, and watch what happens!
No matter what you choose, be sure to test this! Can the airplane drift off the top of the screen? How about off the screen at the bottom? Can it overlap with the land or water?
W.big-bang(INIT-POS, [list: W.on-tick(move-airplane-xy-on-tick), W.on-key(alter-airplane-y-on-key), W.to-draw(place-airplane-xy)])
3.7 Version: Landing
Remember that the objective of our game is to land the airplane, not to keep it airborne indefinitely. That means we need to detect when the airplane reaches the land or water level and, when it does, terminate the animation:
http://world.cs.brown.edu/1/projects/flight-lander/v5.swf
fun is-on-land-or-water(w): w.y >= (HEIGHT - BASE-HEIGHT) end
W.big-bang(INIT-POS, [list: W.on-tick(move-airplane-xy-on-tick), W.on-key(alter-airplane-y-on-key), W.to-draw(place-airplane-xy), W.stop-when(is-on-land-or-water)])
When you test this, you’ll see it isn’t quite right because it doesn’t take account of the size of the airplane’s image. As a result, the airplane only halts when it’s half-way into the land or water, not when it first touches down. Adjust the formula so that it halts upon first contact.
Extend this so that the airplane rolls for a while upon touching land, decelerating according to the laws of physics.
Suppose the airplane is actually landing at a secret subterranean airbase. The actual landing strip is actually below ground level, and opens up only when the airplane comes in to land. That means, after landing, only the parts of the airplane that stick above ground level would be visible. Implement this. As a hint, consider modifying place-airplane-xy.
3.8 Version: A Fixed Balloon
Now let’s add a balloon to the scene. Here’s a video of the action:
http://world.cs.brown.edu/1/projects/flight-lander/v6.swf
Notice that while the airplane moves, everything else—
When does the game halt? There are now two circumstances: one is contact with land or water, and the other is contact with the balloon. The former remains unchanged from what it was before, so we can focus on the latter.
BALLOON-LOC = posn(600, 300)
BALLOON-LOC = posn(random(WIDTH), random(HEIGHT))
Improve the random placement of the balloon so that it is in credible spaces (e.g., not submerged).
fun are-overlapping(airplane-posn, balloon-posn): distance(airplane-posn, balloon-posn) < COLLISION-THRESHOLD end
fun distance(p1, p2): fun square(n): n * n end num-sqrt(square(p1.x - p2.x) + square(p1.y - p2.y)) end
fun game-ends(w): ask: | is-on-land-or-water(w) then: true | are-overlapping(w, BALLOON) then: true | otherwise: false end end
W.big-bang(INIT-POS, [list: W.on-tick(move-airplane-xy-on-tick), W.on-key(alter-airplane-y-on-key), W.to-draw(place-airplane-xy), W.stop-when(game-ends)])
Do you see how to write game-ends more concisely?
fun game-ends(w): is-on-land-or-water(w) or are-overlapping(w, BALLOON-LOC) end
3.9 Version: Keep Your Eye on the Tank
Now we’ll introduce the idea of fuel. In our simplified world, fuel
isn’t necessary to descend—
In the past, we’ve looked at still images of the game video to determine what is changing and what isn’t. For this version, we could easily place a little gauge on the screen to show the quantity of fuel left. However, we don’t on purpose, to illustrate a principle.
You can’t always determine what is fixed and what is changing just by looking at the image. You have to also read the problem statement carefully, and think about it in depth.
It’s clear from our description that there are two things changing: the position of the airplane and the quantity of fuel left. Therefore, the World State must capture the current values of both of these. The fuel is best represented as a single number. However, we do need to create a new structure to represent the combination of these two.
The World State is a structure representing the airplane’s current position and the quantity of fuel left.
data World: | world(p, f) end
We could have also defined the World to be a structure consisting of three components: the airplane’s x-position, the airplane’s y-position, and the quantity of fuel. Why do we choose to use the representation above?
We can again look at each of the parts of the program to determine what can stay the same and what changes. Concretely, we must focus on the functions that consume and produce Worlds.
fun move-airplane-xy-on-tick(w :: World): world( posn( move-airplane-wrapping-x-on-tick(w.p.x), move-airplane-y-on-tick(w.p.y)), w.f) end
fun alter-airplane-y-on-key(w, key): ask: | key == "up" then: if w.f > 0: world(posn(w.p.x, w.p.y - KEY-DISTANCE), w.f - 1) else: w # there's no fuel, so ignore the keystroke end | key == "down" then: world(posn(w.p.x, w.p.y + KEY-DISTANCE), w.f) | otherwise: w end end
Updating the function that renders a scene. Recall that the world has two fields; one of them corresponds to what we used to draw before, and the other isn’t being drawn in the output.
Extend your program to draw a fuel gauge.
3.10 Version: The Balloon Moves, Too
Until now we’ve left our balloon immobile. Let’s now make the game more interesting by letting the balloon move, as this video shows:
http://world.cs.brown.edu/1/projects/flight-lander/v8.swf
Obviously, the balloon’s location needs to also become part of the World State.
The World State is a structure representing the plane’s current position, the balloon’s current position, and the quantity of fuel left.
data World: | world(p :: Posn, b :: Posn, f :: Number) end
The background image (to remove the static balloon).
The drawing handler (to draw the balloon at its position).
The timer handler (to move the balloon as well as the airplane).
The key handler (to construct world data that leaves the balloon unchanged).
The termination condition (to account for the balloon’s dynamic location).
Modify each of the above functions, along with their test cases.
3.11 Version: One, Two, ..., Ninety-Nine Luftballons!
Finally, there’s no need to limit ourselves to only one balloon. How
many is right? Two? Three? Ten? ... Why fix any one number? It could be
a balloon festival!
Albuquerque Balloon Fiesta
Similarly, many games have levels that become progressively harder; we could do the same, letting the number of balloons be part of what changes across levels. However, there is conceptually no big difference between having two balloons and five; the code to control each balloon is essentially the same.
We need to represent a collection of balloons. We can use a list to represent them. Thus:
The World State is a structure representing the plane’s current position, a list of balloon positions, and the quantity of fuel left.
Apply the same function to each balloon in the list.
Determine what to do if two balloons collide.
Introduce a concept of wind, which affects balloons but not the airplane. Afer random periods of time, the wind blows with random speed and direction, causing the ballooons to move laterally.
4 Testing, Examples, and Program Checking
When we think through a problem, it is often useful to write down examples of what we are trying to do. For example (see what I did there?), if we’re asked to compute the [FILL]
When we’re done writing our purported solution, we can have the computer check whether we got it right.
In the process of writing down our expectation, we often find it hard to express with the precision that a computer expects. Sometimes this is because we’re still formulating the details and haven’t yet pinned them down, but at other times it’s because we don’t yet understand the problem. In such situations, the force of precision actually does us good, because it helps us understand the weakness of our understanding.
4.1 From Examples to Tests
failure of tests can be due to
- the program being wrong - the example itself being wrong
when we find a bug, we
- find an example that captures the bug - add it to the program’s test suite
so that if we make the same mistake again [REF: we do], we will catch it right away
4.2 When Tests Fail
Suppose we’ve written the function sqrt, which computes the square root of a given number. We’ve written some tests for this function. We run the program, and find that a test fails. There are two obvious reasons why this can happen.
What are the two obvious reasons?
sqrt(4) is 1.75
sqrt(4) is 2
Note that there is no way for the computer to tell what went wrong. When it reports a test failure, all it’s saying is that there is an inconsistency between the program and the tests. The computer is not passing judgment on which one is “correct”, because it can’t do that. That is a matter for human judgment.For this reason, we’ve been doing research on peer review of tests, so students can help one another review their tests before they begin writing programs.
sqrt(4) is 2
Do you see why?
Depending on how we’ve programmed sqrt, it might return the root -2 instead of 2. Now -2 is a perfectly good answer, too. That is, neither the function nor the particular set of test values we specified is inherently wrong; it’s just that the function happens to be a relation, i.e., it maps one input to multiple outputs (that is, \(\sqrt{4} = \pm 2\)). The question now is how to write the test properly.
4.3 Oracles for Testing
fun is-sqrt(n): n-root = sqrt(n) n == (n-root * n-root) end
check: is-sqrt(4) is true end
fun check-sqrt(n): lam(n-root): n == (n-root * n-root) end end
check: sqrt(4) satisfies check-sqrt(4) end
each string in the output is an atomic symbol, and
the concatenation of the strings in the output yields the input.
check: elemental("Shriram") is [list: "S", "H", "Ri", "Ra", "M"] end
check: elemental("...") is [list: ...] end
4.4 Testing Erroneous Programs
- use RAISES to check erroneous code
5 Functions as Data
It’s interesting to consider how expressive the little programming we’ve learned so far can be. To illustrate this, we’ll work through a few exercises of interesting concepts we can express using just functions as values. We’ll write two quite different things, then show how they converge nicely.
5.1 A Little Calculus
If you’ve studied the differential calculus, you’ve come across curious sytactic statements such as this: \[{{d}\over{dx}} x^2 = 2x\] Let’s unpack what this means: the \(d/dx\), the \(x^2\), and the \(2x\).
First, let’s take on the two expressions; we’ll discuss one, and the discussion will cover the other as well. The correct response to “what does \(x^2\) mean?” is, of course, an error: it doesn’t mean anything, because \(x\) is an unbound identifier.
fun square(x :: Number) -> Number: x * x end fun double(x :: Number) -> Number: 2 * x end
d-dx :: ((Number -> Number) -> (Number -> Number))
Let us now implement d-dx. We’ll implement numerical
differentiation, though in principle we could also implement
symbolic differentiation—
In general, numeric differentiation of a function at a point yields the value of the derivative at that point. We have a handy formula for it: the derivative of \(f\) at \(x\) is \[{f(x + \epsilon) - f(x)} \over {\epsilon}\] as \(\epsilon\) goes to zero in the limit. For now we’ll give the infinitesimal a small but fixed value, and later [Combining Forces: Streams of Derivatives] see how we can improve on this.
epsilon = 0.001
fun d-dx(f :: (Number -> Number)) -> (Number -> Number): (f(x + epsilon) - f(x)) / epsilon end
What’s the problem with the above definition?
fun d-dx(f :: (Number -> Number)) -> (Number -> Number): lam(x :: Number) -> Number: (f(x + epsilon) - f(x)) / epsilon end end
d-dx-square = d-dx(square) check: ins = [list: 0, 1, 10, 100] for map(n from ins): num-floor(d-dx-square(n)) end is for map(n from ins): num-floor(double(n)) end end
d-dx(lam(x): x * x end) = lam(x): 2 * x end
5.2 Streams From Functions
People typically think of a function as serving one purpose: to parameterize an expression. While that is both true and the most common use of a function, it does not justify having a function of no arguments, because that clearly parameterizes over nothing at all. Yet functions of no argument also have a use, because functions actually serve two purposes: to parameterize, and to suspend evaluation of the body until the function is applied. In fact, these two uses are orthogonal, in that one can employ one feature without the other. In Sugaring Over Anonymity we see one direction of this: parameterized functions that are used immediately, so that we employ only abstraction and not delay. Below, we will see the other: delay without abstraction.
Let’s consider the humble list. A list can be only finitely long. However, there are many lists (or sequences) in nature that have no natural upper bound: from mathematical objects (the sequence of natural numbers) to natural ones (the sequence of hits to a Web site). Rather than try to squeeze these unbounded lists into bounded ones, let’s look at how we might represent and program over these unbounded lists.
fun nats-from(n): link(n, nats-from(n + 1)) end
Does this program have a problem?
While this represents our intent, it doesn’t work: running it—
This is where our insight into functions comes in. A function, as we have just noted, delays evaluation of its body until it is applied. Therefore, a function would, in principle, defer the invocation of nats-from(n + 1) until it’s needed.
data Stream<T>: |
| lz-link(h :: T, t :: ( -> Stream<T>)) |
end |
ones = lz-link(1, lam(): ones end)
ones = link(1, ones)
rec ones = lz-link(1, lam(): ones end)
Earlier we said that we can’t writeones = link(1, ones)
What if we tried to writerec ones = link(1, ones)
instead? Does this work and, if so, what value is ones bound to? If it doesn’t work, does it fail to work for the same reason as the definition without the rec?
fun nats-from(n): lz-link(n, lam(): nats-from(n + 1) end) end
nats = nats-from(0)
Earlier, we said that every list is finite and hence eventually terminates. How does this remark apply to streams, such as the definition of ones or nats above?
A similar reasoning doesn’t apply to lists because the rest of the list has already been constructed; in contrast, placing a function there creates the potential for a potentially unbounded amount of computation to still be forthcoming.
That said, even with streams, in any given computation, we will create only a finite prefix of the stream. However, we don’t have to prematurely decide how many; each client and use is welcome to extract less or more, as needed.
fun lz-first<T>(s :: Stream<T>) -> T: s.h end
fun lz-rest<T>(s :: Stream<T>) -> Stream<T>: s.t() end
fun take<T>(n :: Number, s :: Stream<T>) -> List<T>: if n == 0: empty else: link(lz-first(s), take(n - 1, lz-rest(s))) end end
check: take(10, ones) is map(lam(_): 1 end, range(0, 10)) take(10, nats) is range(0, 10) take(10, nats-from(1)) is map((_ + 1), range(0, 10)) end
fun lz-map2<A, B>(f :: (A, A -> B), |
s1 :: Stream<A>, s2 :: Stream<A>) |
-> Stream<B>: |
lz-link(f(lz-first(s1), lz-first(s2)), |
lam(): lz-map2(f, lz-rest(s1), lz-rest(s2)) end) |
end |
rec fibs = lz-link(0, lam(): lz-link(1, lam(): lz-map2((_ + _), fibs, lz-rest(fibs)) end) end)
check: take(10, fibs) is [0, 1, 1, 2, 3, 5, 8, 13, 21, 34] end
Define the equivalent of map, filter, and fold for streams.
Streams and, more generally, infinite data structures that unfold on demand are extremely valuable in programming. Consider, for instance, the possible moves in a game. In some games, this can be infinite; even if it is finite, for interesting games the combinatorics mean that the tree is too large to feasibly store in memory. Therefore, the programmer of the computer’s intelligence must unfold the game tree on demand. Programming it by using the encoding we have described above means the program describes the entire tree, lazily, and the tree unfolds automatically on demand, relieving the programmer of the burden of implementing such a strategy.
In some languages, such as Haskell, lazy evaluation is built in by default. In such a language, there is no need to use thunks. However, lazy evaluation places other burdens on the language [REF].
5.3 Combining Forces: Streams of Derivatives
When we defined d-dx, we set epsilon to an arbitrary, high value. We could instead think of epsilon as itself a stream that produces successively finer values; then, for instance, when the difference in the value of the derivative becomes small enough, we can decide we have a sufficient approximation to the derivative.
The first step is, therefore, to make epsilon some kind of parameter rather than a global constant. That leaves open what kind of parameter it should be (number or stream?) as well as when it should be supplied.
fun d-dx(f :: (Number -> Number)) -> (Number -> (Number -> Number)): lam(x :: Number) -> (Number -> Number): lam(epsilon :: Number) -> Number: (f(x + epsilon) - f(x)) / epsilon end end end
d-dx-square = d-dx(square)
tenths = block: fun by-ten(d): new-denom = d / 10 lz-link(new-denom, lam(): by-ten(new-denom) end) end by-ten(1) end
check: take(3, tenths) is [list: 1/10, 1/100, 1/1000] end
d-dx-square-at-10 = d-dx-square(10)
lz-map(d-dx-square-at-10, tenths)
Extend the above program to take a tolerance, and draw as many values from the epsilon stream as necessary until the difference between successive approximations of the derivative fall within this tolerance.
6 Predicting Growth
6.5 The Tabular Method for Singly-Structurally-Recursive Functions |
We will now commence the study of determining how long a computation takes. We’ll begin with a little (true) story.
6.1 A Little (True) Story
My student Debbie recently wrote tools to analyze data for a startup. The company collects information about product scans made on mobile phones, and Debbie’s analytic tools classified these by product, by region, by time, and so on. As a good programmer, Debbie first wrote synthetic test cases, then developed her programs and tested them. She then obtained some actual test data from the company, broke them down into small chunks, computed the expected answers by hand, and tested her programs again against these real (but small) data sets. At the end of this she was ready to declare the programs ready.
The company was rightly reluctant to share the entire dataset with outsiders, and in turn we didn’t want to be responsible for carefully guarding all their data.
Even if we did get a sample of their data, as more users used their product, the amount of data they had was sure to grow.
Debbie was given 100,000 data points. She broke them down into input sets of 10, 100, 1,000, 10,000, and 100,000 data points, ran her tools on each input size, and plotted the result.
From this graph we have a good bet at guessing how long the tool would take on a dataset of 50,000. It’s much harder, however, to be sure how long it would take on datasets of size 1.5 million or 3 million or 10 million.These processes are respectively called interpolation and extrapolation. We’ve already explained why we couldn’t get more data from the company. So what could we do?
As another problem, suppose we have multiple implementations available. When we plot their running time, say the graphs look like this, with red, green, and blue each representing different implementations. On small inputs, suppose the running times look like this:
This doesn’t seem to help us distinguish between the implementations. Now suppose we run the algorithms on larger inputs, and we get the following graphs:
Now we seem to have a clear winner (red), though it’s not clear there is much to give between the other two (blue and green). But if we calculate on even larger inputs, we start to see dramatic differences:
In fact, the functions that resulted in these lines were the same in all three figures. What these pictures tell us is that it is dangerous to extrapolate too much from the performance on small inputs. If we could obtain closed-form descriptions of the performance of computations, it would be nice if we could compare them better. That is what we will now do.
6.2 The Analytical Idea
With many physical processes, the best we can do is obtain as many data points as possible, extrapolate, and apply statistics to reason about the most likely outcome. Sometimes we can do that in computer science, too, but fortunately we computer scientists have an enormous advantage over most other sciences: instead of measuring a black-box process, we have full access to its internals, namely the source code. This enables us to apply analytical methods.“Analytical” means applying algebraic and other mathematical methods to make predictive statements about a process without running it. The answer we compute this way is complementary to what we obtain from the above experimental analysis, and in practice we will usually want to use a combination of the two to arrive a strong understanding of the program’s behavior.
The analytical idea is startlingly simple. We look at the source of the program and list the operations it performs. For each operation, we look up what it costs.We are going to focus on one kind of cost, namely running time. There are many other other kinds of costs one can compute. We might naturally be interested in space (memory) consumed, which tells us how big a machine we need to buy. We might also care about power, this tells us the cost of our energy bills, or of bandwidth, which tells us what kind of Internet connection we will need. In general, then, we’re interested in resource consumption. In short, don’t make the mistake of equating “performance” with “speed”: the costs that matter depend on the context in which the application runs. We add up these costs for all the operations. This gives us a total cost for the program.
Naturally, for most programs the answer will not be a constant number. Rather, it will depend on factors such as the size of the input. Therefore, our answer is likely to be an expression in terms of parameters (such as the input’s size). In other words, our answer will be a function.
There are many functions that can describe the running-time of a function. Often we want an upper bound on the running time: i.e., the actual number of operations will always be no more than what the function predicts. This tells us the maximunm resource we will need to allocate. Another function may present a lower bound, which tells us the least resource we need. Sometimes we want an average-case analysis. And so on. In this text we will focus on upper-bounds, but keep in mind that all these other analyses are also extremely valuable.
It is incorrect to speak of “the” upper-bound function, because there isn’t just one. Given one upper-bound function, can you construct another one?
6.3 A Cost Model for Pyret Running Time
We begin by presenting a cost model for the running time of Pyret programs. We are interested in the cost of running a program, which is tantamount to studying the expressions of a program. Simply making a definition does not cost anything; the cost is incurred only when we use a definition.
We will use a very simple (but sufficiently accurate) cost model: every operation costs one unit of time in addition to the time needed to evaluate its sub-expressions. Thus it takes one unit of time to look up a variable or to allocate a constant. Applying primitive functions also costs one unit of time. Everything else is a compound expression with sub-expressions. The cost of a compound expression is one plus that of each of its sub-expressions. For instance, the running time cost of the expression e1 + e2 (for some sub-expressions e1 and e2) is the running time for e1 + the running time for e2 + 1. Thus the expression 17 + 29 has a cost of 3 (one for each sub-expression and one for the addition); the expression 1 + (7 * (2 / 9)) costs 7.
First, we are using an abstract rather than concrete notion of time. This is unhelpful in terms of estimating the so-called “wall clock” running time of a program, but then again, that number depends on numerous factors—
not just what kind of processor and how much memory you have, but even what other tasks are running on your computer at the same time. In contrast, abstract time units are more portable. Second, not every operation takes the same number of machine cycles, whereas we have charged all of them the same number of abstract time units. As long as the actual number of cycles each one takes is bounded by a constant factor of the number taken by another, this will not pose any mathematical problems for reasons we will soon understand (Comparing Functions).
6.4 The Size of the Input
We are
going to gloss over how to measure the size of a number. Observe that
the value of a number is exponentially larger than its
size: given three spaces we can write 1,000 different natural
numbers, but given a fourth space we can write not 1,001 but 10,000
different numbers. Thus, when studying functions over numbers, the
space we charge should be only logarithmic in the value. This
distinction will not matter for the programs we work with in this
text, so we permit ourselves the fiction of equating value and size.
In programs where numbers are central—
It can be subtle to define the size of the argument. Suppose a function consumes a list of numbers; it would be natural to define the size of its argument to be the length of the list, i.e., the number of links in the list. We could also define it to be twice as large, to account for both the links and the individual numbers (but as we’ll see (Comparing Functions), constants usually don’t matter). But suppose a function consumes a list music albums, and each music album is itself a list of songs, each of which has information about singers and so on. Then how we measure the size depends on what part of the input the function being analyzed actually examines. If, say, it only returns the length of the list of albums, then it is indifferent to what each list element contains [REF para poly], and only the length of the list of albums matters. If, however, the function returns a list of all the singers on every album, then it traverses all the way down to individual songs, and we have to account for all these data. In short, we care about the size of the data potentially accessed by the function.
6.5 The Tabular Method for Singly-Structurally-Recursive Functions
Given sizes for the arguments, we simply examine the body of the function and add up the costs of the individual operations. Most interesting functions are, however, conditionally defined, and may even recur. Here we will assume there is only one structural recursive call. We will get to more general cases in a bit [Creating Recurrences].
When we have a function with only one recursive call, and it’s structural, there’s a handy technique we can use to handle conditionals.This idea is due to Prabhakar Ragde. We will set up a table. It won’t surprise you to hear that the table will have as many rows as the cond has clauses. But instead of two columns, it has seven! This sounds daunting, but you’ll soon see where they come from and why they’re there.
|Q|: the number of operations in the question
#Q: the number of times the question will execute
TotQ: the total cost of the question (multiply the previous two)
|A|: the number of operations in the answer
#A: the number of times the answer will execute
TotA: the total cost of the answer (multiply the previous two)
Total: add the two totals to obtain an answer for the clause
In the process of computing these costs, we may come across recursive calls in an answer expression. So long as there is only one recursive call in the entire answer, ignore it.
Once you’ve read the material on Creating Recurrences, come back to this and justify why it is okay to just skip the recursive call. Explain in the context of the overall tabular method.
Excluding the treatment of recursion, justify (a) that these columns are individually accurate (e.g., the use of additions and multiplications is appropriate), and (b) sufficient (i.e., combined, they account for all operations that will be performed by that cond clause).
fun len(l): cases (List) l: | empty => 0 | link(f, r) => 1 + len(r) end end
Because the entire body of len is given by a conditional, we can proceed directly to building the table.
Let’s consider the first row. The question costs three units (one each to evaluate empty? and l, and one to apply the function). This is evaluated once per element in the list and once more when the list is empty, i.e., \(k+1\) times. The total cost of the question is thus \(3(k+1)\). The answer takes one unit of time to compute, and is evaluated only once (when the list is empty). Thus it takes a total of one unit, for a total of \(3k+4\) units.
Now for the second row. The question again costs three units, and is evaluated \(k\) times. The answer involves 3 units to evaluate (rest l), two more to evaluate and apply add1, one more to evaluate len...and no more, because we are ignoring the time spent in the recursive call itself. In short, it takes six units of time (in addition to the recursion we’ve chosen to ignore).
|Q| |
| #Q |
| TotQ |
| |A| |
| #A |
| TotA |
| Total |
\(3\) |
| \(k+1\) |
| \(3(k+1)\) |
| \(1\) |
| \(1\) |
| \(1\) |
| \(3k+4\) |
\(3\) |
| \(k\) |
| \(3k\) |
| \(6\) |
| \(k\) |
| \(6k\) |
| \(9k\) |
How accurate is this estimate? If you try applying len to different sizes of lists, do you obtain a consistent estimate for \(k\)?
6.6 Creating Recurrences
We will now see a systematic way of analytically computing the time of a program. Suppose we have only one function f. We will define a function, \(T\), to compute an upper-bound of the time of f.In general, we will have one such cost function for each function in the program. In such cases, it would be useful to give a different name to each function to easily tell them apart. Since we are looking at only one function for now, we’ll reduce notational overhead by having only one \(T\). \(T\) takes as many parameters as f does. The parameters to \(T\) represent the sizes of the corresponding arguments to f. Eventually we will want to arrive at a closed form solution to \(T\), i.e., one that does not refer to \(T\) itself. But the easiest way to get there is to write a solution that is permitted to refer to \(T\), called a recurrence relation, and then see how to eliminate the self-reference [REF].
We repeat this procedure for each function in the program in turn. If there are many functions, first solve for the one with no dependencies on other functions, then use its solution to solve for a function that depends only on it, and progress thus up the dependency chain. That way, when we get to a function that refers to other functions, we will already have a closed-form solution for the referred function’s running time and can simply plug in parameters to obtain a solution.
The strategy outlined above doesn’t work when there are functions that depend on each other. How would you generalize it to handle this case?
The process of setting up a recurrence is easy. We simply define the right-hand-side of \(T\) to add up the operations performed in f’s body. This is straightforward except for conditionals and recursion. We’ll elaborate on the treatment of conditionals in a moment. If we get to a recursive call to f on the argument a, in the recurrence we turn this into a (self-)reference to \(T\) on the size of a.
For conditionals, we use only the |Q| and |A| columns of
the corresponding table. Rather than multiplying by the size of the
input, we add up the operations that happen on one invocation of
f other than the recursive call, and then add the cost of the
recursive call in terms of a reference to \(T\). Thus, if we were
doing this for len above, we would define \(T(k)\)—
Why can we assume that for a list \(p\) elements long, \(p \geq 0\)? And why did we take the trouble to explicitly state this above?
With some thought, you can see that the idea of constructing a recurrence works even when there is more than one recursive call, and when the argument to that call is one element structurally smaller. What we haven’t seen, however, is a way to solve such relations in general. That’s where we’re going next (Solving Recurrences).
6.7 A Notation for Functions
We have seen above that we can describe the running time of len
through a function. We don’t have an especially good notation for
writing such (anonymous) functions. Wait, we
do—
6.8 Comparing Functions
Let’s return to the running time of len. We’ve written down a function of great precision: 12! 4! Is this justified?
At a fine-grained level already, no, it’s not. We’ve lumped many operations, with different actual running times, into a cost of one. So perhaps we should not worry too much about the differences between, say, \([k \rightarrow 12k + 4]\) and \([k \rightarrow 4k + 10]\). If we were given two implementations with these running times, it’s likely that we would pick other characteristics to choose between them.
What this boils down to is being able to compare two
functions—
Obviously, the “bigger” function is likely to be a less useful bound than a “tighter” one. That said, it is conventional to write a “minimal” bound for functions, which means avoiding unnecessary constants, sum terms, and so on. The justification for this is given below (Combining Big-Oh Without Woe).
Note carefully the order of identifiers. We must be able to pick the
constant \(c\) up front for this relationship to hold. Had we swapped
the order, it would mean that for every point along the number line,
there must exist a constant—
This definition has more flexibility than we might initially think. For instance, consider our running example compared with \([k \rightarrow k^2]\). Clearly, the latter function eventually dominates the former: i.e., \[[k \rightarrow 12k+4] \leq [k \rightarrow k^2]\] We just need to pick a sufficiently large constant and we will find this to be true.
What is the smallest constant that will suffice?
You will find more complex definitions in the literature and they all have merits, because they enable us to make finer-grained distinctions than this definition allows. For the purpose of this book, however, the above definition suffices.
Observe that for a given function \(f\), there are numerous functions that are less than it. We use the notation \(O(\cdot)\) to describe this family of functions.In computer science this is usually pronounced “big-Oh”, though some prefer to call it the Bachmann-Landau notation after its originators. Thus if \(g \leq f\), we can write \(g \in O(f)\), which we can read as “\(f\) is an upper-bound for \(g\)”. Thus, for instance, \[[k \rightarrow 3k] \in O([k \rightarrow 4k+12])\] \[[k \rightarrow 4k+12] \in O([k \rightarrow k^2])\] and so on.
Pay especially close attention to our
notation. We write \(\in\)
rather than \(=\) or some other symbol, because \(O(f)\) describes a
family of functions of which \(g\) is a member. We also write \(f\)
rather than \(f(x)\) because we are comparing
functions—
This is not the only notion of function comparison that we can have. For instance, given the definition of \(\leq\) above, we can define a natural relation \(<\). This then lets us ask, given a function \(f\), what are all the functions \(g\) such that \(g \leq f\) but not \(g < f\), i.e., those that are “equal” to \(f\).Look out! We are using quotes because this is not the same as ordinary function equality, which is defined as the two functions giving the same answer on all inputs. Here, two “equal” functions may not give the same answer on any inputs. This is the family of functions that are separated by at most a constant; when the functions indicate the order of growth of programs, “equal” functions signify programs that grow at the same speed (up to constants). We use the notation \(\Theta(\cdot)\) to speak of this family of functions, so if \(g\) is equivalent to \(f\) by this notion, we can write \(g \in \Theta(f)\) (and it would then also be true that \(f \in \Theta(g)\)).
Convince yourself that this notion of function equality is an equivalence relation, and hence worthy of the name “equal”. It needs to be (a) reflexive (i.e., every function is related to itself); (b) antisymmetric (if \(f \leq g\) and \(g \leq f\) then \(f\) and \(g\) are equal); and (c) transitive (\(f \leq g\) and \(g \leq h\) implies \(f \leq h\)).
6.9 Combining Big-Oh Without Woe
Suppose we have a function f (whose running time is) in \(O(F)\). Let’s say we run it \(p\) times, for some given constant. The running time of the resulting code is then \(p \times O(F)\). However, observe that this is really no different from \(O(F)\): we can simply use a bigger constant for \(c\) in the definition of \(O(\cdot)\)—
in particular, we can just use \(pc\). Conversely, then, \(O(pF)\) is equivalent to \(O(F)\). This is the heart of the intution that “multiplicative constants don’t matter”. Suppose we have two functions, f in \(O(F)\) and g in \(O(G)\). If we run f followed by g, we would expect the running time of the combination to be the sum of their individual running times, i.e., \(O(F) + O(G)\). You should convince yourself that this is simply \(O(F + G)\).
Suppose we have two functions, f in \(O(F)\) and g in \(O(G)\). If f invokes g in each of its steps, we would expect the running time of the combination to be the product of their individual running times, i.e., \(O(F) \times O(G)\). You should convince yourself that this is simply \(O(F \times G)\).
|Q| |
| #Q |
| TotQ |
| |A| |
| #A |
| TotA |
| Total |
\(O(1)\) |
| \(O(k)\) |
| \(O(k)\) |
| \(O(1)\) |
| \(O(1)\) |
| \(O(1)\) |
| \(O(k)\) |
\(O(1)\) |
| \(O(k)\) |
| \(O(k)\) |
| \(O(1)\) |
| \(O(k)\) |
| \(O(k)\) |
| \(O(k)\) |
6.10 Solving Recurrences
There is a great deal of literature on solving recurrence equations. In this section we won’t go into general techniques, nor will we even discuss very many different recurrences. Rather, we’ll focus on just a handful that should be in the repertoire of every computer scientist. You’ll see these over and over, so you should instinctively recognize their recurrence pattern and know what complexity they describe (or know how to quickly derive it).
Earlier we saw a recurrence that had two cases: one for the empty input and one for all others. In general, we should expect to find one case for each non-recursive call and one for each recursive one, i.e., roughly one per cases clause. In what follows, we will ignore the base cases so long as the size of the input is constant (such as zero or one), because in such cases the amount of work done will also be a constant, which we can generally ignore (Comparing Functions).
\(T(k)\)
=
\(T(k-1) + c\)
=
\(T(k-2) + c + c\)
=
\(T(k-3) + c + c + c\)
=
...
=
\(T(0) + c \times k\)
=
\(c_0 + c \times k\)
Thus \(T \in O([k \rightarrow k])\). Intuitively, we do a constant amount of work (\(c\)) each time we throw away one element (\(k-1\)), so we do a linear amount of work overall.\(T(k)\)
=
\(T(k-1) + k\)
=
\(T(k-2) + (k-1) + k\)
=
\(T(k-3) + (k-2) + (k-1) + k\)
=
...
=
\(T(0) + (k-(k-1)) + (k-(k-2)) + \cdots + (k-2) + (k-1) + k\)
=
\(c_0 + 1 + 2 + \cdots + (k-2) + (k-1) + k\)
=
\(c_0 + {{k \cdot (k+1)}\over{2}}\)
Thus \(T \in O([k \rightarrow k^2])\). This follows from the solution to the sum of the first \(k\) numbers.\(T(k)\)
=
\(T(k/2) + c\)
=
\(T(k/4) + c + c\)
=
\(T(k/8) + c + c + c\)
=
...
=
\(T(k/2^{\log_2 k}) + c \cdot \log_2 k\)
=
\(c_1 + c \cdot \log_2 k\)
Thus \(T \in O([k \rightarrow \log k])\). Intuitively, we’re able to do only constant work (\(c\)) at each level, then throw away half the input. In a logarithmic number of steps we will have exhausted the input, having done only constant work each time. Thus the overall complexity is logarithmic.\(T(k)\)
=
\(T(k/2) + k\)
=
\(T(k/4) + k/2 + k\)
=
...
=
\(T(1) + k/2^{\log_2 k} + \cdots + k/4 + k/2 + k\)
=
\(c_1 + k(1/2^{\log_2 k} + \cdots + 1/4 + 1/2 + 1)\)
=
\(c_1 + 2k\)
Thus \(T \in O([k \rightarrow k])\). Intuitively, the first time your process looks at all the elements, the second time it looks at half of them, the third time a quarter, and so on. This kind of successive halving is equivalent to scanning all the elements in the input a second time. Hence this results in a linear process.\(T(k)\)
=
\(2T(k/2) + k\)
=
\(2(2T(k/4) + k/2) + k\)
=
\(4T(k/4) + k + k\)
=
\(4(2T(k/8) + k/4) + k + k\)
=
\(8T(k/8) + k + k + k\)
=
...
=
\(2^{\log_2 k} T(1) + k \cdot \log_2 k\)
=
\(k \cdot c_1 + k \cdot \log_2 k\)
Thus \(T \in O([k \rightarrow k \cdot \log k])\). Intuitively, each time we’re processing all the elements in each recursive call (the \(k\)) as well as decomposing into two half sub-problems. This decomposition gives us a recursion tree of logarithmic height, at each of which levels we’re doing linear work.\(T(k)\)
=
\(2T(k-1) + c\)
=
\(2T(k-1) + (2-1)c\)
=
\(2(2T(k-2) + c) + (2-1)c\)
=
\(4T(k-2) + 3c\)
=
\(4T(k-2) + (4-1)c\)
=
\(4(2T(k-3) + c) + (4-1)c\)
=
\(8T(k-3) + 7c\)
=
\(8T(k-3) + (8-1)c\)
=
...
=
\(2^k T(0) + (2^k-1)c\)
Thus \(T \in O([k \rightarrow 2^k])\). Disposing of each element requires doing a constant amount of work for it and then doubling the work done on the rest. This successive doubling leads to the exponential.
Using induction, prove each of the above derivations.
7 Sets Appeal
Your Web browser records which Web pages you’ve visited, and some Web sites use this information to color visited links differently than ones you haven’t seen. When a vote is taken, the vote recorder cares that everyone’s vote is counted (and nobody has voted twice) but usually ignores the order in which they voted. When you use a search engine, you implicitly get a guarantee that each entry will be distinct but there is no guarantee of order: indeed, two subsequent searches may yield the “same” answer (i.e., the same set of hits) but in a different order.
How do we represent these kinds of information? Until now, we’ve had
a clear answer: lists. But explicit in the definition of a list—
check: [list: 1, 2, 3] is [list: 3, 2, 1] end
A set is different from a list: it is unordered, and it ignores duplicates. Because a set has different purposes, it also presents a different interface than does a list. Like lists, sets can contain any kind of element, but all of its contents must be of the same kind. For simplicity, in our concrete example we will focus on just sets of numbers.
mt-set :: Set |
is-in :: (T, Set<T> -> Bool) |
insert :: (T, Set<T> -> Set<T>) |
union :: (Set<T>, Set<T> -> Set<T>) |
size :: (Set<T> -> Number) |
to-list :: (Set<T> -> List<T>) |
insert-many :: (List<T>, Set<T> -> Set<T>)
What does it mean to “ignore” duplicates? Define this more precisely in terms of how the set operations behave in the face of duplicate values.
Sets can contain many kinds of values, but not necessarily any kind: we need to be able to check for two values being equal (which is a requirement for a set, but only a nice feature for a list), which can’t be done with all values [REF]; and sometimes we might even want the elements to obey an ordering [Converting Values to Ordered Values]. Numbers satisfy both characteristics.
7.1 Representing Sets by Lists
In what follows we will see multiple different representations of sets, so we will want names to tell them apart. We’ll use LSet to stand for “sets represented as lists”.
As a starting point, let’s consider the implementation of sets using lists as the underlying representation. After all, a set appears to merely be a list wherein we ignore the order of elements.
7.1.1 Representation Choices
type LSet = List mt-set = empty
fun<T> size(s :: LSet<T>) -> Number: s.length() end
- There is a subtle difference between lists and sets. The list
[list: 1, 1]
is not the same as[list: 1]
because the first list has length two whereas the second has length one. Treated as a set, however, the two are the same: they both have size one. Thus, our implementation of size above is incorrect if we don’t take into account duplicates (either during insertion or while computing the size). We might falsely make assumptions about the order in which elements are retrieved from the set due to the ordering guaranteed provided by the underlying list representation. This might hide bugs that we don’t discover until we change the representation.
We might have chosen a set representation because we didn’t need to care about order, and expected lots of duplicate items. A list representation might store all the duplicates, resulting in significantly more memory use (and slower programs) than we expected.
insert = link
7.1.2 Time Complexity
If we don’t store duplicates, then size is simply length, which takes time linear in \(k\). Similarly, check only needs to traverse the list once to determine whether or not an element is present, which also takes time linear in \(k\). But insert needs to check whether an element is already present, which takes time linear in \(k\), followed by at most a constant-time operation (link).
If we do store duplicates, then insert is constant time: it simply links on the new element without regard to whether it already is in the set representation. check traverses the list once, but the number of elements it needs to visit could be significantly greater than \(k\), depending on how many duplicates have been added. Finally, size needs to check whether or not each element is duplicated before counting it.
What is the time complexity of size if the list has duplicates?
fun<T> size(s :: LSet<T>) -> Number: cases (List) s: | empty => 0 | link(f, r) => if r.member(f): size(r) else: 1 + size(r) end end end
Let’s now compute the complexity of the body of the function, assuming the number of distinct elements in s is \(k\) but the actual number of elements in s is \(d\), where \(d \geq k\). To compute the time to run size on \(d\) elements, \(T(d)\), we should determine the number of operations in each question and answer. The first question has a constant number of operations, and the first answer also a constant. The second question also has a constant number of operations. Its answer is a conditional, whose first question (r.member(f) needs to traverse the entire list, and hence has \(O([k -> d])\) operations. If it succeeds, we recur on something of size \(T(d-1)\); else we do the same but perform a constant more operations. Thus \(T(0)\) is a constant, while the recurrence (in big-Oh terms) is \[T(d) = d + T(d-1)\] Thus \(T \in O([d \rightarrow d^2])\). Note that this is quadratic in the number of elements in the list, which may be much bigger than the size of the set.
7.1.3 Choosing Between Representations
|
| With Duplicates |
| Without Duplicates | ||||
|
| insert |
| is-in |
| insert |
| is-in |
Size of Set |
| constant |
| linear |
| linear |
| linear |
Size of List |
| constant |
| linear |
| linear |
| linear |
Which representation we choose is a matter of how much duplication we expect. If there won’t be many duplicates, then the version that stores duplicates pays a small extra price in return for some faster operations.
Which representation we choose is also a matter of how often we expect each operation to be performed. The representation without duplication is “in the middle”: everything is roughly equally expensive (in the worst case). With duplicates is “at the extremes”: very cheap insertion, potentially very expensive membership. But if we will mostly only insert without checking membership, and especially if we know membership checking will only occur in situations where we’re willing to wait, then permitting duplicates may in fact be the smart choice. (When might we ever be in such a situation? Suppose your set represents a backup data structure; then we add lots of data but very rarely—
indeed, only in case of some catastrophe— ever need to look for things in it.) Another way to cast these insights is that our form of analysis is too weak. In situations where the complexity depends so heavily on a particular sequence of operations, big-Oh is too loose and we should instead study the complexity of specific sequences of operations. We will address precisely this question later (Halloween Analysis).
Moreover, there is no reason a program should use only one representation. It could well begin with one representation, then switch to another as it better understands its workload. The only thing it would need to do to switch is to convert all existing data between the representations.
How might this play out above? Observe that data conversion is very
cheap in one direction: since every list without duplicates is
automatically also a list with (potential) duplicates, converting in
that direction is trivial (the representation stays unchanged, only
its interpretation changes). The other direction is harder: we have to
filter duplicates (which takes time quadratic in the number of
elements in the list). Thus, a program can make an initial guess about
its workload and pick a representation accordingly, but maintain
statistics as it runs and, when it finds its assumption is wrong,
switch representations—
7.1.4 Other Operations
Implement the remaining operations catalogued above (<set-operations>) under each list representation.
Implement the operationremove :: (Set<T>, T -> Set<T>)
under each list representation. What difference do you see?
Suppose you’re asked to extend sets with these operations, as the set analog of first and rest:one :: (Set<T> -> T) others :: (Set<T> -> T)
You should refuse to do so! Do you see why?
With lists the “first” element is well-defined, whereas sets are defined to have no ordering. Indeed, just to make sure users of your sets don’t accidentally assume anything about your implementation (e.g., if you implement one using first, they may notice that one always returns the element most recently added to the list), you really ought to return a random element of the set on each invocation.
Unfortunately, returning a random element means the above interface is
unusable. Suppose s is bound to a set containing 1,
2, and 3. Say the first time one(s) is invoked
it returns 2, and the second time 1. (This already
means one is not a function—
Why is it unreasonable for one(s) to produce the same result as one(others(s))?
Suppose you wanted to extend sets with a subset operation that partitioned the set according to some condition. What would its type be? See [REF join lists] for a similar operation.
The types we have written above are not as crisp as they could be. Define a has-no-duplicates predicate, refine the relevant types with it, and check that the functions really do satisfy this criterion.
7.2 Making Sets Grow on Trees
Let’s start by noting that it seems better, if at all possible, to avoid storing duplicates. Duplicates are only problematic during insertion due to the need for a membership test. But if we can make membership testing cheap, then we would be better off using it to check for duplicates and storing only one instance of each value (which also saves us space). Thus, let’s try to improve the time complexity of membership testing (and, hopefully, of other operations too).
It seems clear that with a (duplicate-free) list representation of a
set, we cannot really beat linear time for membership checking. This
is because at each step, we can eliminate only one element from
contention which in the worst case requires a linear amount of work to
examine the whole set. Instead, we need to eliminate many more
elements with each comparison—
In our handy set of recurrences (Solving Recurrences), one stands out: \(T(k) = T(k/2) + c\). It says that if, with a constant amount of work we can eliminate half the input, we can perform membership checking in logarithmic time. This will be our goal.
Before we proceed, it’s worth putting logarithmic growth in
perspective. Asymptotically, logarithmic is obviously not as nice as
constant. However, logarithmic growth is very pleasant because it
grows so slowly. For instance, if an input doubles from size \(k\) to
\(2k\), its logarithm—
7.2.1 Converting Values to Ordered Values
We have actually just made an extremely subtle assumption. When we check one element for membership and eliminate it, we have eliminated only one element. To eliminate more than one element, we need one element to “speak for” several. That is, eliminating that one value needs to have safely eliminated several others as well without their having to be consulted. In particular, then, we can no longer compare for mere equality, which compares one set element against another element; we need a comparison that compares against an element against a set of elements.
To do this, we have to convert an arbitrary datum into a datatype that
permits such comparison. This is known as hashing.
A hash function consumes an arbitrary value and produces a comparable
representation of it (its hash)—
Let us now consider how one can compute hashes. If the input datatype is a number, it can serve as its own hash. Comparison simply uses numeric comparison (e.g., <). Then, transitivity of < ensures that if an element \(A\) is less than another element \(B\), then \(A\) is also less than all the other elements bigger than \(B\). The same principle applies if the datatype is a string, using string inequality comparison. But what if we are handed more complex datatypes?
- Consider a list of primes as long as the string. Raise each prime by the corresponding number, and multiply the result. For instance, if the string is represented by the character codes [6, 4, 5] (the first character has code 6, the second one 4, and the third 5), we get the hash
num-expt(2, 6) * num-expt(3, 4) * num-expt(5, 5)
or 16200000. - Simply add together all the character codes. For the above example, this would correspond to the has
6 + 4 + 5
or 15.
Now let us consider more general datatypes. The principle of hashing will be similar. If we have a datatype with several variants, we can use a numeric tag to represent the variants: e.g., the primes will give us invertible tags. For each field of a record, we need an ordering of the fields (e.g., lexicographic, or “alphabetical” order), and must hash their contents recursively; having done so, we get in effect a string of numbers, which we have shown how to handle.
Now that we have understood how one can deterministically convert any arbitrary datum into a number, in what follows, we will assume that the trees representing sets are trees of numbers. However, it is worth considering what we really need out of a hash. In Set Membership by Hashing Redux, we will not need partial ordering. Invertibility is more tricky. In what follows below, we have assumed that finding a hash is tantamount to finding the set element itself, which is not true if multiple values can have the same hash. In that case, the easiest thing to do is to store alongside the hash all the values that hashed to it, and we must search through all of these values to find our desired element. Unfortunately, this does mean that in an especially perverse situation, the desired logarithmic complexity will actually be linear complexity after all!
In real systems, hashes of values are typically computed by the programming language implementation. This has the virtue that they can often be made unique. How does the system achieve this? Easy: it essentially uses the memory address of a value as its hash. (Well, not so fast! Sometimes the memory system can and does move values around ((part "garbage-collection")). In these cases computing a hash value is more complicated.)
7.2.2 Using Binary Trees
Because logs come from trees.
data BT: | leaf | node(v :: Number, l :: BT, r :: BT) end
fun is-in-bt(e :: Number, s :: BT) -> Boolean: cases (BT) s: | leaf => false | node(v, l, r) => if e == v: true else: is-in-bt(e, l) or is-in-bt(e, r) end end end
How can we improve on this? The comparison needs to help us eliminate not only the root but also one whole sub-tree. We can only do this if the comparison “speaks for” an entire sub-tree. It can do so if all elements in one sub-tree are less than or equal to the root value, and all elements in the other sub-tree are greater than or equal to it. Of course, we have to be consistent about which side contains which subset; it is conventional to put the smaller elements to the left and the bigger ones to the right. This refines our binary tree definition to give us a binary search tree (BST).
Here is a candiate predicate for recognizing when a binary tree is in fact a binary search tree:fun is-a-bst-buggy(b :: BT) -> Boolean: cases (BT) b: | leaf => true | node(v, l, r) => (is-leaf(l) or (l.v <= v)) and (is-leaf(r) or (v <= r.v)) and is-a-bst-buggy(l) and is-a-bst-buggy(r) end end
Is this definition correct?
check: is-a-bst-buggy(node(5, node(3, leaf, node(6, leaf, leaf)), leaf)) is true # WRONG! end
Fix the BST checker.
type BST = BT%(is-a-bst)
type TSet = BST mt-set = leaf
fun is-in(e :: Number, s :: BST) -> Bool: cases (BST) s: | leaf => ... | node(v, l :: BST, r :: BST) => ... ... is-in(l) ... ... is-in(r) ... end end
fun is-in(e :: Number, s :: BST) -> Boolean: cases (BST) s: | leaf => false | node(v, l, r) => if e == v: true else if e < v: is-in(e, l) else if e > v: is-in(e, r) end end end fun insert(e :: Number, s :: BST) -> BST: cases (BST) s: | leaf => node(e, leaf, leaf) | node(v, l, r) => if e == v: s else if e < v: node(v, insert(e, l), r) else if e > v: node(v, l, insert(e, r)) end end end
You should now be able to define the remaining operations. Of these, size clearly requires linear time (since it has to count all the elements), but because is-in and insert both throw away one of two children each time they recur, they take logarithmic time.
Suppose we frequently needed to compute the size of a set. We ought to be able to reduce the time complexity of size by having each tree ☛ cache its size, so that size could complete in constant time (note that the size of the tree clearly fits the criterion of a cache, since it can always be reconstructed). Update the data definition and all affected functions to keep track of this information correctly.
But wait a minute. Are we actually done? Our recurrence takes the form \(T(k) = T(k/2) + c\), but what in our data definition guaranteed that the size of the child traversed by is-in will be half the size?
Construct an example—
consisting of a sequence of inserts to the empty tree— such that the resulting tree is not balanced. Show that searching for certain elements in this tree will take linear, not logarithmic, time in its size.
check: insert(4, insert(3, insert(2, insert(1, mt-set)))) is node(1, leaf, node(2, leaf, node(3, leaf, node(4, leaf, leaf)))) end
Therefore, using a binary tree, and even a BST, does not guarantee the complexity we want: it does only if our inputs have arrived in just the right order. However, we cannot assume any input ordering; instead, we would like an implementation that works in all cases. Thus, we must find a way to ensure that the tree is always balanced, so each recursive call in is-in really does throw away half the elements.
7.2.3 A Fine Balance: Tree Surgery
Let’s define a balanced binary search tree (BBST). It must obviously be a search tree, so let’s focus on the “balanced” part. We have to be careful about precisely what this means: we can’t simply expect both sides to be of equal size because this demands that the tree (and hence the set) have an even number of elements and, even more stringently, to have a size that is a power of two.
Define a predicate for a BBST that consumes a BT and returns a Boolean indicating whether or not it a balanced search tree.
Therefore, we relax the notion of balance to one that is both accommodating and sufficient. We use the term balance factor for a node to refer to the height of its left child minus the height of its right child (where the height is the depth, in edges, of the deepest node). We allow every node of a BBST to have a balance factor of \(-1\), \(0\), or \(1\) (but nothing else): that is, either both have the same height, or the left or the right can be one taller. Note that this is a recursive property, but it applies at all levels, so the imbalance cannot accumulate making the whole tree arbitrarily imbalanced.
Given this definition of a BBST, show that the number of nodes is exponential in the height. Thus, always recurring on one branch will terminate after a logarithmic (in the number of nodes) number of steps.
Here is an obvious but useful observation: every BBST is also a BST (this was true by the very definition of a BBST). Why does this matter? It means that a function that operates on a BST can just as well be applied to a BBST without any loss of correctness.
So far, so easy. All that leaves is a means of creating a BBST, because it’s responsible for ensuring balance. It’s easy to see that the constant empty-set is a BBST value. So that leaves only insert.
Here is our situation with insert. Assuming we start with a BBST, we can determine in logarithmic time whether the element is already in the tree and, if so, ignore it.To implement a bag we count how many of each element are in it, which does not affect the tree’s height. When inserting an element, given balanced trees, the insert for a BST takes only a logarithmic amount of time to perform the insertion. Thus, if performing the insertion does not affect the tree’s balance, we’re done. Therefore, we only need to consider cases where performing the insertion throws off the balance.
Observe that because \(<\) and \(>\) are symmetric (likewise with \(<=\) and \(>=\)), we can consider insertions into one half of the tree and a symmetric argument handles insertions into the other half. Thus, suppose we have a tree that is currently balanced into which we are inserting the element \(e\). Let’s say \(e\) is going into the left sub-tree and, by virtue of being inserted, will cause the entire tree to become imbalanced.Some trees, like family trees [REF], represent real-world data. It makes no sense to “balance” a family tree: it must accurately model whatever reality it represents. These set-representing trees, in contrast, are chosen by us, not dictated by some external reality, so we are free to rearrange them.
There are two ways to proceed. One is to consider all the places where we might insert \(e\) in a way that causes an imbalance and determine what to do in each case.
Enumerate all the cases where insertion might be problematic, and dictate what to do in each case.
The number of cases is actually quite overwhelming (if you didn’t think so, you missed a few...). Therefore, we instead attack the problem after it has occurred: allow the existing BST insert to insert the element, assume that we have an imbalanced tree, and show how to restore its balance.The insight that a tree can be made “self-balancing” is quite remarkable, and there are now many solutions to this problem. This particular one, one of the oldest, is due to G.M. Adelson-Velskii and E.M. Landis. In honor of their initials it is called an AVL Tree, though the tree itself is quite evident; their genius is in defining re-balancing.
Thus, in what follows, we begin with a tree that is balanced; insert causes it to become imbalanced; we have assumed that the insertion happened in the left sub-tree. In particular, suppose a (sub-)tree has a balance factor of \(2\) (positive because we’re assuming the left is imbalanced by insertion). The procedure for restoring balance depends critically on the following property:
Show that if a tree is currently balanced, i.e., the balance factor at every node is \(-1\), \(0\), or \(1\), then insert can at worst make the balance factor \(\pm 2\).
The algorithm that follows is applied as insert returns from its recursion, i.e., on the path from the inserted value back to the root. Since this path is of logarithmic length in the set’s size (due to the balancing property), and (as we shall see) performs only a constant amount of work at each step, it ensures that insertion also takes only logarithmic time, thus completing our challenge.
p |
/ \ |
q C |
/ \ |
A B |
Let’s say that \(C\) is of height \(k\). Before insertion, the tree rooted at \(q\) must have had height \(k+1\) (or else one insertion cannot create imbalance). In turn, this means \(A\) must have had height \(k\) or \(k-1\), and likewise for \(B\).
Why can they both not have height \(k+1\) after insertion?
7.2.3.1 Left-Left Case
p |
/ \ |
q C |
/ \ |
r B |
/ \ |
A1 A2 |
\(A_1 < r\).
\(r < A_2 < q\).
\(q < B < p\).
\(p < C\).
The height of \(A_1\) or of \(A_2\) is \(k\) (the cause of imbalance).
The height of the other \(A_i\) is \(k-1\) (see exercise above [REF]).
The height of \(C\) is \(k\) (initial assumption; \(k\) is arbitrary).
The height of \(B\) must be \(k-1\) or \(k\) (argued above).
q |
/ \ |
r p |
/ \ / \ |
A1 A2 B C |
7.2.3.2 Left-Right Case
p |
/ \ |
q C |
/ \ |
A r |
/ \ |
B1 B2 |
\(A < q\).
\(q < B_1 < r\).
\(r < B_2 < p\).
\(p < C\).
Suppose the height of \(C\) is \(k\).
The height of \(A\) must be \(k-1\) or \(k\).
The height of \(B_1\) or \(B_2\) must be \(k\), but not both (see exercise above [REF]). The other must be \(k-1\).
p |
/ \ |
r C |
/ \ |
q B2 |
/ \ |
A B1 |
r |
/ \ |
q p |
/ \ / \ |
A B1 B2 C |
7.2.3.3 Any Other Cases?
Were we a little too glib before? In the left-right case we said that only one of \(B_1\) or \(B_2\) could be of height \(k\) (after insertion); the other had to be of height \(k-1\). Actually, all we can say for sure is that the other has to be at most height \(k-2\).
Can the height of the other tree actually be \(k-2\) instead of \(k-1\)?
If so, does the solution above hold? Is there not still an imbalance of two in the resulting tree?
Is there actually a bug in the above algorithm?
8 [EMPTY]
9 Halloween Analysis
In Predicting Growth, we introduced the idea of big-Oh complexity to measure the worst-case time of a computation. As we saw in Choosing Between Representations, however, this is sometimes too coarse a bound when the complexity is heavily dependent on the exact sequence of operations run. Now, we will consider a different style of complexity analysis that better accommodates operation sequences.
9.1 A First Example
Consider, for instance, a set that starts out empty, followed by a sequence of \(k\) insertions and then \(k\) membership tests, and suppose we are using the representation without duplicates. Insertion time is proportional to the size of the set (and list); this is initially \(0\), then \(1\), and so on, until it reaches size \(k\). Therefore, the total cost of the sequence of insertions is \(k \cdot (k+1) / 2\). The membership tests cost \(k\) each in the worst case, because we’ve inserted up to \(k\) distinct elements into the set. The total time is then \[k^2 / 2 + k / 2 + k^2\] for a total of \(2k\) operations, yielding an average of \[{3 \over 4} k + {1 \over 4}\] steps per operation in the worst case.
9.2 The New Form of Analysis
What have we computed? We are still computing a worst case cost, because we have taken the cost of each operation in the sequence in the worst case. We are then computing the average cost per operation. Therefore, this is a average of worst cases.Importantly, this is different from what is known as average-case analysis, which uses probability theory to compute the estimated cost of the computation. We have not used any probability here. Note that because this is an average per operation, it does not say anything about how bad any one operation can be (which, as we will see (Amortization Versus Individual Operations), can be quite a bit worse); it only says what their average is.
In the above case, this new analysis did not yield any big surprises. We have found that on average we spend about \(k\) steps per operation; a big-Oh analysis would have told us that we’re performing \(2k\) operations with a cost of \(O([k \rightarrow k])\) each in the number of distinct elements; per operation, then, we are performing roughly linear work in the worst-case number of set elements.
As we will soon see, however, this won’t always be the case: this new analysis can cough up pleasant surprises.
Before we proceed, we should give this analysis its name. Formally, it is called amortized analysis. Amortization is the process of spreading a payment out over an extended but fixed term. In the same way, we spread out the cost of a computation over a fixed sequence, then determine how much each payment will be.I have given it a whimsical name because Halloween is a(n American) holiday devoted to ghosts, ghouls, and other symbols of death. Amortization comes from the Latin root mort-, which means death, because an amortized analysis is one conducted “at the death”, i.e., at the end of a fixed sequence of operations.
9.3 An Example: Queues from Lists
We have already seen lists [REF] and sets (Sets Appeal). Now let’s consider another fundamental computer science data structure: the queue. A queue is a linear, ordered data structure, just like a list; however, the set of operations they offer is different. In a list, the traditional operations follow a last-in, first-out discipline: .first returns the element most recently linked. In contrast, a queue follows a first-in, first-out discipline. That is, a list can be visualized as a stack, while a queue can be visualized as a conveyer belt.
9.3.1 List Representations
We can define queues using lists in the natural way: every enqueue is implemented with link, while every dequeue requires traversing the whole list until its end. Conversely, we could make enqueuing traverse to the end, and dequeuing correspond to .rest. Either way, one of these operations will take constant time while the other will be linear in the length of the list representing the queue.
In fact, however, the above paragraph contains a key insight that will let us do better.
Observe that if we store the queue in a list with most-recently-enqueued element first, enqueuing is cheap (constant time). In contrast, if we store the queue in the reverse order, then dequeuing is constant time. It would be wonderful if we could have both, but once we pick an order we must give up one or the other. Unless, that is, we pick...both.
One half of this is easy. We simply enqueue elements into a list with the most recent addition first. Now for the (first) crucial insight: when we need to dequeue, we reverse the list. Now, dequeuing also takes constant time.
9.3.2 A First Analysis
Of course, to fully analyze the complexity of this data structure, we must also account for the reversal. In the worst case, we might argue that any operation might reverse (because it might be the first dequeue); therefore, the worst-case time of any operation is the time it takes to reverse, which is linear in the length of the list (which corresponds to the elements of the queue).
However, this answer should be unsatisfying. If we perform \(k\) enqueues followed by \(k\) dequeues, then each of the enqueues takes one step; each of the last \(k-1\) dequeues takes one step; and only the first dequeue requires a reversal, which takes steps proportional to the number of elements in the list, which at that point is \(k\). Thus, the total cost of operations for this sequence is \(k \cdot 1 + k + (k-1) \cdot 1 = 3k-1\) for a total of \(2k\) operations, giving an amortized complexity of effectively constant time per operation!
9.3.3 More Liberal Sequences of Operations
In the process of this, however, I’ve quietly glossed over something you’ve probably noticed: in our candidate sequence all dequeues followed all enqueues. What happens on the next enqueue? Because the list is now reversed, it will have to take a linear amount of time! So we have only partially solved the problem.
data Queue<T>: | queue(tail :: List<T>, head :: List<T>) end mt-q :: Queue = queue(empty, empty)
fun<T> enqueue(q :: Queue<T>, e :: T) -> Queue<T>: queue(link(e, q.tail), q.head) end
data Response<T>: | elt-and-q(e :: T, r :: Queue<T>) end
fun<T> dequeue(q :: Queue<T>) -> Response<T>: cases (List) q.head: | empty => new-head = q.tail.reverse() elt-and-q(new-head.first, queue(empty, new-head.rest)) | link(f, r) => elt-and-q(f, queue(q.tail, r)) end end
9.3.4 A Second Analysis
We can now reason about sequences of operations as we did before, by adding up costs and averaging. However, another way to think of it is this. Let’s give each element in the queue three “credits”. Each credit can be used for one constant-time operation.
One credit gets used up in enqueuing. So long as the element stays in the tail list, it still has two credits to spare. When it needs to be moved to the head list, it spends one more credit in the link step of reversal. Finally, the dequeuing operation performs one operation too.
Because the element does not run out of credits, we know it must have had enough. These credits reflect the cost of operations on that element. From this (very informal) analysis, we can conclude that in the worst case, any permutation of enqueues and dequeues will still cost only a constant amount of amortized time.
9.3.5 Amortization Versus Individual Operations
Note, however, that the constant represents an average across the
sequence of operations. It does not put a bound on the cost of any one
operation. Indeed, as we have seen above, when dequeue finds the head
list empty it reverses the tail, which takes time linear in the size
of the tail—
9.4 Reading More
At this point I have only briefly touched on the subject of amortized analysis. A very nice tutorial by Rebecca Fiebrink provides much more information. The authoritative book on algorithms, Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein, covers amortized analysis in extensive detail.
10 Sharing and Equality
10.1 Re-Examining Equality
data BinTree: | leaf | node(v, l :: BinTree, r :: BinTree) end a-tree = node(5, node(4, leaf, leaf), node(4, leaf, leaf)) b-tree = block: four-node = node(4, leaf, leaf) node(5, four-node, four-node) end
check: |
a-tree is b-tree |
a-tree.l is a-tree.l |
a-tree.l is a-tree.r |
b-tree.l is b-tree.r |
end |
However, there is another sense in which these trees are not equivalent. concretely, a-tree constructs a distinct node for each child, while b-tree uses the same node for both children. Surely this difference should show up somehow, but we have not yet seen a way to write a program that will tell these apart.
check: identical(a-tree, b-tree) is false identical(a-tree.l, a-tree.l) is true identical(a-tree.l, a-tree.r) is false identical(b-tree.l, b-tree.r) is true end
check: a-tree is-not%(identical) b-tree a-tree.l is%(identical) a-tree.l a-tree.l is-not%(identical) a-tree.r b-tree.l is%(identical) b-tree.r end
check: a-tree is b-tree a-tree is-not%(identical) b-tree a-tree.l is a-tree.r a-tree.l is-not%(identical) a-tree.r end
10.2 The Cost of Evaluating References
From a complexity viewpoint, it’s important for us to understand how these references work. As we have hinted, four-node is computed only once, and each use of it refers to the same value: if, instead, it was evaluated each time we referred to four-node, there would be no real difference between a-tree and b-tree, and the above tests would not distinguish between them.
L = range(0, 100)
L1 = link(1, L) L2 = link(-1, L)
check: L1.rest is%(identical) L L2.rest is%(identical) L L1.rest is%(identical) L2.rest end
fun check-for-no-copy(another-l): identical(another-l, L) end check: check-for-no-copy(L) is true end
check: L satisfies check-for-no-copy end
10.3 On the Internet, Nobody Knows You’re a DAG
Despite the name we’ve given it, b-tree is not actually a tree. In a tree, by definition, there are no shared nodes, whereas in b-tree the node named by four-node is shared by two parts of the tree. Despite this, traversing b-tree will still terminate, because there are no cyclic references in it: if you start from any node and visit its “children”, you cannot end up back at that node. There is a special name for a value with such a shape: directed acyclic graph (DAG).
Many important data structures are actually a DAG underneath. For instance, consider Web sites. It is common to think of a site as a tree of pages: the top-level refers to several sections, each of which refers to sub-sections, and so on. However, sometimes an entry needs to be cataloged under multiple sections. For instance, an academic department might organize pages by people, teaching, and research. In the first of these pages it lists the people who work there; in the second, the list of courses; and in the third, the list of research groups. In turn, the courses might have references to the people teaching them, and the research groups are populated by these same people. Since we want only one page per person (for both maintenance and search indexing purposes), all these personnel links refer back to the same page for people.
data Content: | page(s :: String) | section(title :: String, sub :: List<Content>) end
people-pages :: Content = section("People", [list: page("Church"), page("Dijkstra"), page("Haberman") ])
fun get-person(n): index(people-pages.sub, n) end
theory-pages :: Content = section("Theory", [list: get-person(0), get-person(1)]) systems-pages :: Content = section("Systems", [list: get-person(1), get-person(2)])
site :: Content = section("Computing Sciences", [list: theory-pages, systems-pages])
check: theory = index(site.sub, 0) systems = index(site.sub, 1) theory-dijkstra = index(theory.sub, 1) systems-dijkstra = index(systems.sub, 0) theory-dijkstra is systems-dijkstra theory-dijkstra is%(identical) systems-dijkstra end
10.4 From Acyclicity to Cycles
web-colors = link("white", link("grey", web-colors))
map2(color-table-row, table-row-content, web-colors)
Unfortunately, there are many things wrong with this attempted definition.
Do you see what they are?
This will not even parse. The identifier web-colors is not bound on the right of the =.
- Earlier, we saw a solution to such a problem: use rec [Streams From Functions]. What happens if we write
rec web-colors = link("white", link("grey", web-colors))
instead?ExerciseWhy does rec work in the definition of ones but not above?
Assuming we have fixed the above problem, one of two things will happen. It depends on what the initial value of web-colors is. Because it is a dummy value, we do not get an arbitrarily long list of colors but rather a list of two colors followed by the dummy value. Indeed, this program will not even type-check.
Suppose, however, that web-colors were written instead as a function definition to delay its creation:fun web-colors(): link("white", link("grey", web-colors())) end
On its own this just defines a function. If, however, we use it—web-colors()— it goes into an infinite loop constructing links. Even if all that were to work, map2 would either (a) not terminate because its second argument is indefinitely long, or (b) report an error because the two arguments aren’t the same length.
When you get to cycles, even defining the datum becomes difficult because its definition depends on itself so it (seemingly) needs to already be defined in the process of being defined. We will return to cyclic data later: Recursion and Cycles from Mutation.
11 Graphs
In From Acyclicity to Cycles we introduced a special kind of sharing: when the data become cyclic, i.e., there exist values such that traversing other reachable values from them eventually gets you back to the value at which you began. Data that have this characteristic are called graphs.Technically, a cycle is not necessary to be a graph; a tree or a DAG is also regarded as a (degenerate) graph. In this section, however, we are interested in graphs that have the potential for cycles.
Lots of very important data are graphs. For instance, the people and connections in social media form a graph: the people are nodes or vertices and the connections (such as friendships) are links or edges. They form a graph because for many people, if you follow their friends and then the friends of their friends, you will eventually get back to the person you started with. (Most simply, this happens when two people are each others’ friends.) The Web, similarly is a graph: the nodes are pages and the edges are links between pages. The Internet is a graph: the nodes are machines and the edges are links between machines. A transportation network is a graph: e.g., cities are nodes and the edges are transportation links between them. And so on. Therefore, it is essential to understand graphs to represent and process a great deal of interesting real-world data.
Graphs are important and interesting for not only practical but also principled reasons. The property that a traversal can end up where it began means that traditional methods of processing will no longer work: if it blindly processes every node it visits, it could end up in an infinite loop. Therefore, we need better structural recipes for our programs. In addition, graphs have a very rich structure, which lends itself to several interesting computations over them. We will study both these aspects of graphs below.
11.1 Understanding Graphs
data BinT: | leaf | node(v, l :: ( -> BinT), r :: ( -> BinT)) end
rec tr = node("rec", lam(): tr end, lam(): tr end) t0 = node(0, lam(): leaf end, lam(): leaf end) t1 = node(1, lam(): t0 end, lam(): t0 end) t2 = node(2, lam(): t1 end, lam(): t1 end)
fun sizeinf(t :: BinT) -> Number: cases (BinT) t: | leaf => 0 | node(v, l, r) => ls = sizeinf(l()) rs = sizeinf(r()) 1 + ls + rs end end
What happens when we call sizeinf(tr)?
It goes into an infinite loop: hence the inf in its name.
check: size(tr) is 1 size(t0) is 1 size(t1) is 2 size(t2) is 3 end
It’s clear that we need to somehow remember what nodes we have visited previously: that is, we need a computation with “memory”. In principle this is easy: we just create an extra data structure that checks whether a node has already been counted. As long as we update this data structure correctly, we should be all set. Here’s an implementation.
fun sizect(t :: BinT) -> Number: fun szacc(shadow t :: BinT, seen :: List<BinT>) -> Number: if has-id(seen, t): 0 else: cases (BinT) t: | leaf => 0 | node(v, l, r) => ns = link(t, seen) ls = szacc(l(), ns) rs = szacc(r(), ns) 1 + ls + rs end end end szacc(t, empty) end
fun has-id<A>(seen :: List<A>, t :: A): cases (List) seen: | empty => false | link(f, r) => if f <=> t: true else: has-id(r, t) end end end
How does this do? Well, sizect(tr) is indeed 1, but sizect(t1) is 3 and sizect(t2) is 7!
Explain why these answers came out as they did.
ls = szacc(l(), ns) rs = szacc(r(), ns)
The remedy for this, therefore, is to remember every node we
visit. Then, when we have no more nodes to process, instead of
returning only the size, we should return all the nodes visited
until now. This ensures that nodes that have multiple paths to them
are visited on only one path, not more than once. The logic for this
is to return two values from each traversal—
fun size(t :: BinT) -> Number: fun szacc(shadow t :: BinT, seen :: List<BinT>) -> {n :: Number, s :: List<BinT>}: if has-id(seen, t): {n: 0, s: seen} else: cases (BinT) t: | leaf => {n: 0, s: seen} | node(v, l, r) => ns = link(t, seen) ls = szacc(l(), ns) rs = szacc(r(), ls.s) {n: 1 + ls.n + rs.n, s: rs.s} end end end szacc(t, empty).n end
Sure enough, this function satisfies the above tests.
11.2 Representations
The representation we’ve seen above for graphs is certainly a start towards creating cyclic data, but it’s not very elegant. It’s both error-prone and inelegant to have to write lam everywhere, and remember to apply functions to () to obtain the actual values. Therefore, here we explore other representations of graphs that are more conventional and also much simpler to manipulate.
The structure of the graph, and in particular, its density. We will discuss this further later (Measuring Complexity for Graphs).
The representation in which the data are provided by external sources. Sometimes it may be easier to simply adapt to their representation; in particular, in some cases there may not even be a choice.
The features provided by the programming language, which make some representations much harder to use than others.
A way to construct graphs.
A way to identify (i.e., tell apart) nodes or vertices in a graph.
Given a way to identify nodes, a way to get that node’s neighbors in the graph.
Our running example will be a graph whose nodes are cities in the United States and edges are known direct flight connections between them, reminiscent of the route maps found in the back of airlines’ in-flight magazines.
11.2.1 Links by Name
data KeyedNode: | keyed-node(key :: String, content, adj :: List<String>) end type KNGraph = List<KeyedNode> type Node = KeyedNode type Graph = KNGraph
kn-cities :: Graph = block: knEWR = keyed-node("nwk", "Newark", [list: "chi", "den", "saf", "hou"]) knORD = keyed-node("chi", "Chicago", [list: "nwk", "saf"]) knWOS = keyed-node("wor", "Worcester", [list: ]) knHOU = keyed-node("hou", "Houston", [list: "nwk", "saf"]) knDEN = keyed-node("den", "Denver", [list: "nwk", "saf"]) knSFO = keyed-node("saf", "San Francisco", [list: "nwk", "den", "hou"]) [list: knEWR, knORD, knWOS, knHOU, knDEN, knSFO] end
fun find-kn(key :: Key, graph :: Graph) -> Node: matches = for filter(n from graph): n.key == key end matches.first # there had better be exactly one! end
Convert the comment in the function into an invariant about the datum. Express this invariant as a refinement and add it to the declaration of graphs.
fun kn-neighbors(city :: Key, graph :: Graph) -> List<Key>: city-node = find-kn(city, graph) city-node.adj end
check: ns = kn-neighbors("hou", kn-cities) ns is [list: "nwk", "saf"] map(_.content, map(find-kn(_, kn-cities), ns)) is [list: "Newark", "San Francisco"] end
11.2.2 Links by Indices
In some languages, it is common to use numbers as names. This is
especially useful when numbers can be used to get access to an element
in a constant amount of time (in return for having a bound on the
number of elements that can be accessed). Here, we use a list—
data IndexedNode: | idxed-node(content, adj :: List<Number>) end type IXGraph = List<IndexedNode> type Node = IndexedNode type Graph = IXGraph
ix-cities :: Graph = block: inEWR = idxed-node("Newark", [list: 1, 4, 5, 3]) inORD = idxed-node("Chicago", [list: 0, 5]) inWOS = idxed-node("Worcester", [list: ]) inHOU = idxed-node("Houston", [list: 0, 5]) inDEN = idxed-node("Denver", [list: 0, 5]) inSFO = idxed-node("San Francisco", [list: 0, 4, 3]) [list: inEWR, inORD, inWOS, inHOU, inDEN, inSFO] end
fun find-ix(idx :: Key, graph :: Graph) -> Node: index(graph, idx) end
fun ix-neighbors(city :: Key, graph :: Graph) -> List<Key>: city-node = find-ix(city, graph) city-node.adj end
check: ns = ix-neighbors(3, ix-cities) ns is [list: 0, 5] map(_.content, map(find-ix(_, ix-cities), ns)) is [list: "Newark", "San Francisco"] end
Something deeper is going on here. The keyed nodes have intrinsic keys: the key is part of the datum itself. Thus, given just a node, we can determine its key. In contrast, the indexed nodes represent extrinsic keys: the keys are determined outside the datum, and in particular by the position in some other data structure. Given a node and not the entire graph, we cannot know for what its key is. Even given the entire graph, we can only determine its key by using identical, which is a rather unsatisfactory approach to recovering fundamental information. This highlights a weakness of using extrinsically keyed representations of information. (In return, extrinsically keyed representations are easier to reassemble into new collections of data, because there is no danger of keys clashing: there are no intrinsic keys to clash.)
11.2.3 A List of Edges
data Edge: | edge(src :: String, dst :: String) end type LEGraph = List<Edge> type Graph = LEGraph
le-cities :: Graph = [list: edge("Newark", "Chicago"), edge("Newark", "Denver"), edge("Newark", "San Francisco"), edge("Newark", "Houston"), edge("Chicago", "Newark"), edge("Chicago", "San Francisco"), edge("Houston", "Newark"), edge("Houston", "San Francisco"), edge("Denver", "Newark"), edge("Denver", "San Francisco"), edge("San Francisco", "Newark"), edge("San Francisco", "Denver"), edge("San Francisco", "Houston") ]
fun le-neighbors(city :: Key, graph :: Graph) -> List<Key>: neighboring-edges = for filter(e from graph): city == e.src end names = for map(e from neighboring-edges): e.dst end names end
check: le-neighbors("Houston", le-cities) is [list: "Newark", "San Francisco"] end
11.2.4 Abstracting Representations
We would like a general representation that lets us abstract over the
specific implementations. We will assume that broadly we have
available a notion of Node that has content, a notion of
Keys (whether or not intrinsic), and a way to obtain the
neighbors—
11.3 Measuring Complexity for Graphs
Before we begin to define algorithms over graphs, we should consider how to measure the size of a graph. A graph has two components: its nodes and its edges. Some algorithms are going to focus on nodes (e.g., visiting each of them), while others will focus on edges, and some will care about both. So which do we use as the basis for counting operations: nodes or edges?
No two nodes are connected. Then there are no edges at all.
Every two nodes is connected. Then there are essentially as many edges as the number of pairs of nodes: approximately \(k^2\).
The number of nodes can thus be significantly less or even significantly more than the number of edges. Were this difference a matter of constants, we could have ignored it; but it’s not. For sparse graphs, the number of nodes dominates the number of edges by a factor of \(k\) (or even infinity, if there truly are zero edges, but such graphs are usually not very interesting or difficult to process); for extremely dense graphs, too, the ratio is one of \(k\), but in the other direction.
Therefore, when we want to speak of the complexity of algorithms over graphs, we have to consider the sizes of both the number of nodes and edges. In a connected graphA graph is connected if, from every node, we can traverse edges to get to every other node., however, there must be at least as many edges as nodes, which means the number of edges dominates the number of nodes. Since we are usually processing connected graphs, or connected parts of graphs one at a time, we can bound the number of nodes by the number of edges.
11.4 Reachability
Many uses of graphs need to address reachability: whether we can, using edges in the graph, get from one node to another. For instance, a social network might suggest as contacts all those who are reachable from existing contacts. On the Internet, traffic engineers care about whether packets can get from one machine to another. On the Web, we care about whether all public pages on a site are reachable from the home page. We will study how to compute reachability using our travel graph as a running example.
11.4.1 Simple Recursion
If they are the same, then clearly reachability is trivially satisfied.
If they are not, we have to iterate through the neighbors of the source node and ask whether the destination is reachable from each of those neighbors.
fun reach-1(src :: Key, dst :: Key, g :: Graph) -> Boolean: |
if src == dst: |
true |
else: |
loop(neighbors(src, g)) |
end |
end |
fun loop(ns): |
cases (List) ns: |
| empty => false |
| link(f, r) => |
if reach-1(f, dst, g): true else: loop(r) end |
end |
end |
check: |
reach = reach-1 |
reach("nwk", "nwk", kn-cities) is true |
reach("nwk", "chi", kn-cities) is true |
reach("nwk", "wor", kn-cities) is false |
reach("nwk", "hou", kn-cities) is true |
reach("nwk", "den", kn-cities) is true |
reach("nwk", "saf", kn-cities) is true |
end |
Which of the above examples leads to a cycle? Why?
11.4.2 Cleaning up the Loop
Before we continue, let’s try to improve the expression of the loop. While the nested function above is a perfectly reasonable definition, we can use Pyret’s for to improve its readability.
fun ormap(fun-body, l): cases (List) l: | empty => false | link(f, r) => if fun-body(f): true else: ormap(fun-body, r) end end end
for ormap(n from neighbors(src, g)): reach-1(n, dst, g) end
11.4.3 Traversal with Memory
Because we have cyclic data, we have to remember what nodes we’ve already visited and avoid traversing them again. Then, every time we begin traversing a new node, we add it to the set of nodes we’ve already started to visit so that. If we return to that node, because we can assume the graph has not changed in the meanwhile, we know that additional traversals from that node won’t make any difference to the outcome.This property is known as ☛ idempotence.
fun reach-2(src :: Key, dst :: Key, g :: Graph, visited :: List<Key>) -> Boolean: |
if visited.member(src): |
false |
else if src == dst: |
true |
else: |
new-visited = link(src, visited) |
for ormap(n from neighbors(src, g)): |
reach-2(n, dst, g, new-visited) |
end |
end |
end |
Does it matter if the first two conditions were swapped, i.e., the beginning of reach-2 began withif src == dst: true else if visited.member(src): false
? Explain concretely with examples.
We repeatedly talk about remembering the nodes that we have begun to visit, not the ones we’ve finished visiting. Does this distinction matter? How?
11.4.4 A Better Interface
fun reach-3(s :: Key, d :: Key, g :: Graph) -> Boolean: fun reacher(src :: Key, dst :: Key, visited :: List<Key>) -> Boolean: if visited.member(src): false else if src == dst: true else: new-visited = link(src, visited) for ormap(n from neighbors(src, g)): reacher(n, dst, new-visited) end end end reacher(s, d, empty) end
Does this really gives us a correct implementation? In particular, does this address the problem that the size function above addressed? Create a test case that demonstrates the problem, and then fix it.
11.5 Depth- and Breadth-First Traversals
It is conventional for computer science texts to call these depth- and breadth-first search. However, searching is just a specific purpose; traversal is a general task that can be used for many purposes.
The reachability algorithm we have seen above has a special property. At every node it visits, there is usually a set of adjacent nodes at which it can continue the traversal. It has at least two choices: it can either visit each immediate neighbor first, then visit all of the neighbors’ neighbors; or it can choose a neighbor, recur, and visit the next immediate neighbor only after that visit is done. The former is known as breadth-first traversal, while the latter is depth-first traversal.
The algorithm we have designed uses a depth-first strategy: inside <graph-reach-1-loop>, we recur on the first element of the list of neighbors before we visit the second neighbor, and so on. The alternative would be to have a data structure into which we insert all the neighbors, then pull out an element at a time such that we first visit all the neighbors before their neighbors, and so on. This naturally corresponds to a queue (An Example: Queues from Lists).
Using a queue, implement breadth-first traversal.
If we correctly check to ensure we don’t re-visit nodes, then both
breadth- and depth-first traversal will properly visit the entire
reachable graph without repetition (and hence not get into an infinite
loop). Each one traverses from a node only once, from which it
considers every single edge. Thus, if a graph has \(N\) nodes and
\(E\) edges, then a lower-bound on the complexity of traversal is
\(O(N + E)\). We must also consider the cost of checking whether we
have already visited a node before (which is a set membership problem,
which we address elsewhere: Sets Appeal). Finally, we have to
consider the cost of maintaining the data structure that keeps track
of our traversal. In the case of depth-first traversal,
recursion—
This would suggest that depth-first traversal is always better than breadth-first traversal. However, breadth-first traversal has one very important and valuable property. Starting from a node \(N\), when it visits a node \(P\), count the number of edges taken to get to \(P\). Breadth-first traversal guarantees that there cannot have been a shorter path to \(P\): that is, it finds a shortest path to \(P\).
Why “a” rather than “the” shortest path?
Prove that breadth-first traversal finds a shortest path.
11.6 Graphs With Weighted Edges
Consider a transportation graph: we are usually interested not only in whether we can get from one place to another, but also in what it “costs” (where we may have many different cost measures: money, distance, time, units of carbon dioxide, etc.). On the Internet, we might care about the ☛ latency or ☛ bandwidth over a link. Even in a social network, we might like to describe the degree of closeness of a friend. In short, in many graphs we are interested not only in the direction of an edge but also in some abstract numeric measure, which we call its weight.
In the rest of this study, we will assume that our graph edges have weights. This does not invalidate what we’ve studied so far: if a node is reachable in an unweighted graph, it remains reachable in a weighted one. But the operations we are going to study below only make sense in a weighted graph.We can, however, always treat an unweighted graph as a weighted one by giving every edge the same, constant, positive weight (say one).
When treating an unweighted graph as a weighted one, why do we care that every edge be given a positive weight?
Revise the graph data definitions to account for edge weights.
Weights are not the only kind of data we might record about edges. For instance, if the nodes in a graph represent people, the edges might be labeled with their relationship (“mother”, “friend”, etc.). What other kinds of data can you imagine recording for edges?
11.7 Shortest (or Lightest) Paths
Imagine planning a trip: it’s natural that you might want to get to your destination in the least time, or for the least money, or some other criterion that involves minimizing the sum of edge weights. This is known as computing the shortest path.
We should immediately clarify an unfortunate terminological
confusion. What we really want to compute is the lightest
path—
Construct a graph and select a pair of nodes in it such that the shortest path from one to the other is not the lightest one, and vice versa.
We have already seen (Depth- and Breadth-First Traversals) that breadth-first search constructs shortest paths in unweighted graphs. These correspond to lightest paths when there are no weights (or, equivalently, all weights are identical and positive). Now we have to generalize this to the case where the edges have weights.
w :: Key -> Number
w :: Key -> Option<Number>
Now let’s think about this inductively. What do we know initially?
Well, certainly that the source node is at a distance of zero from
itself (that must be the lightest path, because we can’t get any
lighter). This gives us a (trivial) set of nodes for which we already
know the lightest weight. Our goal is to grow this set of
nodes—
Inductively, at each step we have the set of all nodes for which we know the lightest path (initially this is just the source node, but it does mean this set is never empty, which will matter in what we say next). Now consider all the edges adjacent to this set of nodes that lead to nodes for which we don’t already know the lightest path. Choose a node, \(q\), that minimizes the total weight of the path to it. We claim that this will in fact be the lightest path to that node.
If this claim is true, then we are done. That’s because we would now add \(q\) to the set of nodes whose lightest weights we now know, and repeat the process of finding lightest outgoing edges from there. This process has thus added one more node. At some point we will find that there are no edges that lead outside the known set, at which point we can terminate.
It stands to reason that terminating at this point is safe: it corresponds to having computed the reachable set. The only thing left is to demonstrate that this greedy algorithm yields a lightest path to each node.
We will prove this by contradiction. Suppose we have the path \(s \rightarrow d\) from source \(s\) to node \(d\), as found by the algorithm above, but assume also that we have a different path that is actually lighter. At every node, when we added a node along the \(s \rightarrow d\) path, the algorithm would have added a lighter path if it existed. The fact that it did not falsifies our claim that a lighter path exists (there could be a different path of the same weight; this would be permitted by the algorithm, but it also doesn’t contradict our claim). Therefore the algorithm does indeed find the lightest path.
What remains is to determine a data structure that enables this algorithm. At every node, we want to know the least weight from the set of nodes for which we know the least weight to all their neighbors. We could achieve this by sorting, but this is overkill: we don’t actually need a total ordering on all these weights, only the lightest one. A heap [REF] gives us this.
What if we allowed edges of weight zero? What would change in the above algorithm?
What if we allowed edges of negative weight? What would change in the above algorithm?
For your reference, this algorithm is known as Dijkstra’s Algorithm.
11.8 Moravian Spanning Trees
At the turn of the milennium, the US National Academy of Engineering surveyed its members to determine the “Greatest Engineering Achievements of the 20th Century”. The list contained the usual suspects: electronics, computers, the Internet, and so on. But a perhaps surprising idea topped the list: (rural) electrification.Read more about it on their site.
11.8.1 The Problem
To understand the history of national electrical grids, it helps to go back to Moravia in the 1920s. Like many parts of the world, it was beginning to realize the benefits of electricity and intended to spread it around the region. A Moravian academia named Otakar Borůvka heard about the problem, and in a remarkable effort, described the problem abstractly, so that it could be understood without reference to Moravia or electrical networks. He modeled it as a problem about graphs.
The electrical network must reach all the towns intended to be covered by it. In graph terms, the solution must be spanning, meaning it must visit every node in the graph.
Redundancy is a valuable property in any network: that way, if one set of links goes down, there might be another way to get a payload to its destination. When starting out, however, redundancy may be too expensive, especially if it comes at the cost of not giving someone a payload at all. Thus, the initial solution was best set up without loops or even redundant paths. In graph terms, the solution had to be a tree.
Finally, the goal was to solve this problem for the least cost possible. In graph terms, the graph would be weighted, and the solution had to be a minimum.
11.8.2 A Greedy Solution
Begin with a solution consisting of a single node, chosen arbitrarily. For the graph consisting of this one node, this solution is clearly a minimum, spanning, and a tree.
Of all the edges incident on nodes in the solution that connect to a node not already in the solution, pick the edge with the least weight.Note that we consider only the incident edges, not their weight added to the weight of the node to which they are incident.
Add this edge to the solution. The claim is that for the new solution will be a tree (by construction), spanning (also by construction), and a minimum. The minimality follows by an argument similar to that used for Dijkstra’s Algorithm.
Jarník had the misfortune of publishing this work in Czech in 1930, and it went largely ignored. It was rediscovered by others, most notably by R.C. Prim in 1957, and is now generally known as Prim’s Algorithm, though calling it Jarník’s Algorithm would attribute credit in the right place.
Implementing this algorithm is pretty easy. At each point, we need to know the lightest edge incident on the current solution tree. Finding the lightest edge takes time linear in the number of these edges, but the very lightest one may create a cycle. We therefore need to efficiently check for whether adding an edge would create a cycle, a problem we will return to multiple times (Checking Component Connectedness). Assuming we can do that effectively, we then want to add the lightest edge and iterate. Even given an efficient solution for checking cyclicity, this would seem to require an operation linear in the number of edges for each node. With better representations we can improve on this complexity, but let’s look at other ideas first.
11.8.3 Another Greedy Solution
Recall that Jarník presented his algorithm in 1930, when computers didn’t exist, and Prim his in 1957, when they were very much in their infancy. Programming computers to track heaps was a non-trivial problem, and many algorithms were implemented by hand, where keeping track of a complex data structure without making errors was harder still. There was need for a solution that was required less manual bookkeeping (literally speaking).
In 1956, Joseph Kruskal presented such a solution. His idea was elegantly simple. The Jarník algorithm suffers from the problem that each time the tree grows, we have to revise the content of the heap, which is already a messy structure to track. Kruskal noted the following.
To obtain a minimum solution, surely we want to include one of the edges of least weight in the graph. Because if not, we can take an otherwise minimal solution, add this edge, and remove one other edge; the graph would still be just as connected, but the overall weight would be no more and, if the removed edge were heavier, would be less.Note the careful wording: there may be many edges of the same least weight, so adding one of them may remove another, and therefore not produce a lighter tree; but the key point is that it certainly will not produce a heavier one. By the same argument we can add the next lightest edge, and the next lightest, and so on. The only time we cannot add the next lightest edge is when it would create a cycle (that problem again!).
Therefore, Kruskal’s algorithm is utterly straightforward. We first
sort all the edges, ordered by ascending weight. We then take each
edge in ascending weight order and add it to the solution provided it
will not create a cycle. When we have thus processed all the edges, we
will have a solution that is a tree (by construction), spanning
(because every connected vertex must be the endpoint of some edge),
and of minimum weight (by the argument above). The complexity is that
of sorting (which is \([e \rightarrow e \log e]\) where \(e\) is the
size of the edge set. We then iterate over each element in \(e\),
which takes time linear in the size of that set—
11.8.4 A Third Solution
Both the Jarník and Kruskal solutions have one flaw: they require a
centralized data structure (the priority heap, or the sorted list) to
incrementally build the solution. As parallel computers became
available, and graph problems grew large, computer scientists looked
for solutions that could be implemented more efficiently in
parallel—
In 1965, M. Sollin constructed an algorithm that met these needs beautifully. In this algorithm, instead of constructing a single solution, we grow multiple solution components (potentially in parallel if we so wish). Each node starts out as a solution component (as it was at the first step of Jarník’s Algorithm). Each node considers the edges incident to it, and picks the lightest one that connects to a different component (that problem again!). If such an edge can be found, the edge becomes part of the solution, and the two components combine to become a single component. The entire process repeats.
Because every node begins as part of the solution, this algorithm naturally spans. Because it checks for cycles and avoids them, it naturally forms a tree.Note that avoiding cycles yields a DAG and is not automatically guaranteed to yield a tree. We have been a bit lax about this difference throughout this section. Finally, minimality follows through similar reasoning as we used in the case of Jarník’s Algorithm, which we have essentially run in parallel, once from each node, until the parallel solution components join up to produce a global solution.
Of course, maintaining the data for this algorithm by hand is a nightmare. Therefore, it would be no surprise that this algorithm was coined in the digital age. The real surprise, therefore, is that it was not: it was originally created by Otakar Borůvka himself.
pinpointed the real problem lying underneath the electrification problem so it could be viewed in a context-independent way,
created a descriptive language of graph theory to define it precisely, and
even solved the problem in addition to defining it.
As you might have guessed by now, this problem is indeed called the MST in other textbooks, but “M” stands not for Moravia but for “Minimum”. But given Borůvka’s forgotten place in history, I prefer the more whimsical name.
11.8.5 Checking Component Connectedness
As we’ve seen, we need to be able to efficiently tell whether two
nodes are in the same component. One way to do this is to conduct a
depth-first traversal (or breadth-first traversal) starting from the
first node and checking whether we ever visit the second one. (Using
one of these traversal strategies ensures that we terminate in the
presence of loops.) Unfortunately, this takes a linear amount of time
(in the size of the graph) for every pair of nodes—
It is helpful to reduce this problem from graph connectivity to a more general one: of disjoint-set structure (colloquially known as union-find for reasons that will soon be clear). If we think of each connected component as a set, then we’re asking whether two nodes are in the same set. But casting it as a set membership problem makes it applicable in several other applications as well.
The setup is as follows. For arbitrary values, we want the ability to think of them as elements in a set. We are interested in two operations. One is obviously union, which merges two sets into one. The other would seem to be something like is-in-same-set that takes two elements and determines whether they’re in the same set. Over time, however, it has proven useful to instead define the operator find that, given an element, “names” the set (more on this in a moment) that the element belongs to. To check whether two elements are in the same set, we then have to get the “set name” for each element, and check whether these names are the same. This certainly sounds more roundabout, but this means we have a primitive that may be useful in other contexts, and from which we can easily implement is-in-same-set.
data Element<T>: | elt(val :: T, parent :: Option<Element>) end
fun is-same-element(e1, e2): e1.val <=> e2.val end
Why do we check only the value parts?
fun is-in-same-set(e1 :: Element, e2 :: Element, s :: Sets) -> Boolean: s1 = fynd(e1, s) s2 = fynd(e2, s) identical(s1, s2) end
type Sets = List<Element>
fun fynd(e :: Element, s :: Sets) -> Element: cases (List) s: | empty => raise("fynd: shouldn't have gotten here") | link(f, r) => if is-same-element(f, e): cases (Option) f.parent: | none => f | some(p) => fynd(p, s) end else: fynd(e, r) end end end
Why is this recursive in the nested cases?
fun union(e1 :: Element, e2 :: Element, s :: Sets) -> Sets: s1 = fynd(e1, s) s2 = fynd(e2, s) if identical(s1, s2): s else: update-set-with(s, s1, s2) end end
fun update-set-with(s :: Sets, child :: Element, parent :: Element) -> Sets: cases (List) s: | empty => raise("update: shouldn't have gotten here") | link(f, r) => if is-same-element(f, child): link(elt(f.val, some(parent)), r) else: link(f, update-set-with(r, child, parent)) end end end
check: s0 = map(elt(_, none), [list: 0, 1, 2, 3, 4, 5, 6, 7]) s1 = union(index(s0, 0), index(s0, 2), s0) s2 = union(index(s1, 0), index(s1, 3), s1) s3 = union(index(s2, 3), index(s2, 5), s2) print(s3) is-same-element(fynd(index(s0, 0), s3), fynd(index(s0, 5), s3)) is true is-same-element(fynd(index(s0, 2), s3), fynd(index(s0, 5), s3)) is true is-same-element(fynd(index(s0, 3), s3), fynd(index(s0, 5), s3)) is true is-same-element(fynd(index(s0, 5), s3), fynd(index(s0, 5), s3)) is true is-same-element(fynd(index(s0, 7), s3), fynd(index(s0, 7), s3)) is true end
First, because we are performing functional updates, the value of the parent reference keeps “changing”, but these changes are not visible to older copies of the “same” value. An element from different stages of unioning has different parent references, even though it is arguably the same element throughout. This is a place where functional programming hurts.
Relatedly, the performance of this implementation is quite bad. fynd recursively traverses parents to find the set’s name, but the elements traversed are not updated to record this new name. We certainly could update them by reconstructing the set afresh each time, but that complicates the implementation and, as we will soon see, we can do much better.
12 State, Change, and More Equality
12.1 A Canonical Mutable Structure
As we have motivated (Checking Component Connectedness), sometimes it’s nice to be able to change the value of a datum rather than merely construct a new one with an updated value. The main advantage to changing it is that every value that refers to it can now see this change. The main disadvantage to changing it is that every value that refers to it can now see this change. Using this power responsibly is therefore an important programming challenge.
box consumes a value and creates a mutable box containing that value.
unbox consumes a box and returns the value contained in the box.
set-box consumes a box, a new value, and changes the box to contain the value. All subsequent unboxes of that box will now return the new value.
class Box<T> { |
private T the_value; |
Box(T v) { |
this.the_value = v; |
} |
T get() { |
return this.the_value; |
} |
void set(T v) { |
this.the_value = v; |
} |
} |
data Box: | box(ref v) where: n1 = box(1) n2 = box(2) n1!{v : 3} n2!{v : 4} n1!v is 3 n2!v is 4 end
Why do we say “type-consistent” above, rather than “the same type”?
The values could be related by subtyping (Subtyping).
What does the comment about longevity mean? How does this apply to reusing values extracted from fields?
12.2 What it Means to be Identical
b0 = box("a value") |
b1 = box("a value") |
b2 = b1 |
check: b0!v == b1!v is true b0 is-not%(identical) b1 b1 is%(identical) b2 b1!v is b2!v end
hold-b1-value = b1!v |
b1!{v: "a different value"} |
check: b0!v == b1!v is false b0 is-not%(identical) b1 b1 is%(identical) b2 b1!v is b2!v end
b1!{v: hold-b1-value}
b2!{v: "yet another value"} check: b0!v == b1!v is false b0 is-not%(identical) b1 b1 is%(identical) b2 b1!v is b2!v end
b0!{v: "yet another value"} check: b0!v is-not b1!v end
Now, why did we bother holding on to and restoring the value? It’s
because at the end of each of these sequences, the values have all been
restored, but by making the change, we have been able to observe which
other names detect the change and which ones do not—
In practice, identical does not behave this way: it would be
too disruptive—
12.3 Recursion and Cycles from Mutation
web-colors = link("white", link("grey", web-colors))
The first reason is the fact that we’re defining a function. A function’s body is not evaluated right away—
only when we apply it— so the language can wait for the body to finish being defined. (We’ll see what this might mean in a moment.) The second reason isn’t actually a reason: function definitions actually are special. But we are about to expose what’s so special about them—
it’s the use of a box!— so that any definition can avail of it.
data CList: clink(v, r) end
Do you see why not?
Let’s decompose the intended infinite list into two pieces: lists that begin with white and ones that begin with grey. What follows white? A grey list. What follows grey? A white list. It is clear we can’t write down these two definitions because one of them must precede the other, but each one depends on the other. (This is the same problem as trying to write a single definition above.)
12.3.1 Partial Definitions
data CList: clink(v, ref r) end
white-clink = clink("white", "dummy") grey-clink = clink("grey", "dummy")
white-clink!{r: grey-clink} grey-clink!{r: white-clink}
web-colors = white-clink
fun take(n :: Number, il :: CList) -> List: if n == 0: empty else: link(il.v, take(n - 1, il!r)) end end
check: take(4, web-colors) is [list: "white", "grey", "white", "grey"] end
12.3.2 Recursive Functions
fun sum(n): if n > 0: n + sum(n - 1) else: 0 end end
sum = lam(n): if n > 0: n + sum(n - 1) else: 0 end end
rec sum = lam(n): if n > 0: n + sum(n - 1) else: 0 end end
12.3.3 Premature Evaluation
Observe that the above description reveals that there is a time between the creation of the name and the assignment of a value to it. Can this intermediate state be observed? It sure can!
Make sure the value is sufficiently obscure so that it can never be used in a meaningful context. This means values like 0 are especially bad, and indeed most common datatypes should be shunned. Indeed, there is no value already in use that can be used here that might not be confusing in some context.
- The language might create a new type of value just for use here. For instance, imagine this definition of CList:
data CList: | undef | clink(v, ref r) end
undef appears to be a “base case”, thus making CList very similar to List. In truth, however, the undef is present only until the first mutation happens, after which it will never again be present: the intent is that r only contain a reference to other clinks.The undef value can now be used by the language to check for premature uses of a cyclic list. However, while this is technically feasible, it imposes a run-time penalty. Therefore, this check is usually only performed by languages focused on teaching; professional programmers are assumed to be able to manage the consequences of such premature use by themselves.
Allow the recursion constructor to be used only in the case of binding functions, and then make sure that the right-hand side of the binding is syntactically a function. This solution precludes some reasonable programs, but is certainly safe.
12.3.4 Cyclic Lists Versus Streams
The color list example above is, as we have noted, very reminiscent of stream examples. What is the relationship between the two ways of defining infinite data?
Cyclic lists have on their side simplicity. The pattern of definition used above can actually be encapsulated into a language construct using desugaring (Desugaring: Growing the Language Without Enlarging It), so programmers do not need to wrestle with mutable fields (as above) or thunks (as streams demand). This simplicity, however, comes at a price: cyclic lists can only represent strictly repeating data, i.e., you cannot define nats or fibs as cyclic lists. In contrast, the function abstraction in a stream makes it generative: each invocation can create a truly novel datum (such as the next natural or Fibonacci number). Therefore, it is straightforward to implement cyclic lists as streams, but not vice versa.
12.4 From Identifiers to Variables
As we have seen, mutable values can be aliased, which means references can inadvertently have their values changed. Because these values can be passed around, it can be difficult to track all the aliases that might exist (because it would be infeasible for a value to retain “backward references”).
var x = 0 x := 1
x = 1; |
x = 3; |
Now, we also use the term “variable” in mathematics to refer to function parameters. For instance, in f(y) = y+3 we say that y is a “variable”. That is called a variable because it varies across invocations; however, within each invocation, it has the same value in its scope. Our identifiers until now have corresponded to this mathematical notion of a variable.If the identifier was bound to a box, then it remained bound to the same box value. It’s the content of the box that changed, not which box the identifier was bound to. In contrast, programming variables can vary even within each invocation, like the Java x above.
Henceforth, we will use variable when we mean an identifier whose value can change within its scope, and identifier when this cannot happen. If in doubt, we might play it safe and use “variable”; if the difference doesn’t really matter, we might use either one. It is less important to get caught up in these specific terms than to understand that they represent a distinction that matters (Mutation: Structures and Variables).
12.5 Interaction of Mutation with Closures: Counters
check: l1 = mk-counter() l1() is 1 l1() is 2 l2 = mk-counter() l2() is 1 l1() is 3 l2() is 2 end
We now see how we can implement this using both mutable structures (specifically, boxes) and variables.
12.5.1 Implementation Using Boxes
fun mk-counter(): ctr = box(0) lam(): ctr!{v : (ctr!v + 1)} ctr!v end end
fun mk-broken-counter(): lam(): ctr = box(0) ctr!{v : (ctr!v + 1)} ctr!v end where: l1 = mk-broken-counter() l1() is 1 l1() is 1 l2 = mk-broken-counter() l2() is 1 l1() is 1 l2() is 1 end
The examples above hint at an implementation necessity. Clearly, whatever the environment closes over in the procedure returned by mk-counter must refer to the same box each time. Yet something also needs to make sure that the value in that box is different each time! Look at it more carefully: it must be lexically the same, but dynamically different. This distinction will be at the heart of a strategy for implementing state (Mutation: Structures and Variables).
12.5.2 Implementation Using Variables
fun mk-counter(): var ctr = 0 lam(): ctr := ctr + 1 ctr end where: l1 = mk-counter() l1() is 1 l1() is 2 l2 = mk-counter() l2() is 1 l1() is 3 l2() is 2 end
fun mk-broken-counter(): lam(): var ctr = 0 ctr := ctr + 1 ctr end where: l1 = mk-broken-counter() l1() is 1 l1() is 1 l2 = mk-broken-counter() l2() is 1 l1() is 1 l2() is 1 end
12.6 A Family of Equality Predicates
The binary operator ==, which is also used as the equality comparison by in when testing.
identical, also written as <=>.
check: E1(b0, b1) is true E1(b1, b2) is true end
> b0 |
box("a value") |
> b1 |
box("a different value") |
> b2 |
box("a different value") |
check: E1(b0, b1) is false E1(b1, b2) is true end
Confirm that equal-now does indeed have the properties ascribed to E1 above.
check: (b0 == b1) is false (b1 == b2) is true identical(b0, b1) is false identical(b1, b2) is true end
b0 = box("a value") b1 = box("a value") b2 = b1 l0 = [list: b0] l1 = [list: b1] l2 = [list: b2]
check: identical(l0, l1) is false identical(l1, l2) is false end
check: equal-now(l0, l1) is true equal-now(l1, l2) is true end
check: (l0 == l1) is false (l1 == l2) is true end
What might == represent that is interestingly different from both identical and equal-now? When it returns true, it is that the two values will “print the same” now and forever. How is this possible? It is because == recursively checks that the two arguments are structural until it gets to a mutable field; at that point, it checks that they are identical. If they are identical, then any change made to one will be reflected in the other (because they are in fact the same mutable field). That means their content, too, will always “print the same”. Therefore, we can now reveal the name given to ==: it is equal-always.
12.6.1 A Hierarchy of Equality
Observe that if two values v1 and v2 are equal-now, they are not necessarily equal-always; if they are equal-always, they are not necessarily identical. We have seen examples of both these cases above.
In contrast, if two values are identical, then they are
certainly going to be equal-always. That is because their
mutable fields reduce to identical, while the immutable
parts—
In most languages, it is common to have two equality operators, corresponding to identical (known as reference equality) and equal-now (known as structural equality). Pyret is rare in having a third operator, equal-always. For most programs, this is in fact the most useful equality operator: it is not overly bothered with details of aliasing, which can be difficult to predict; at the same time it makes decisions that stand the test of time, thereby forming a useful basis for various optimizations (which may not even be conscious of their temporal assumptions). This is why is in testing uses equal-always by default, and forces users to explicitly pick a different primitive if they want it.
12.6.2 Space and Time Complexity
identical always takes constant time. Indeed, some programs use identical precisely because they want constant-time equality, carefully structuring their program so that values that should be considered equal are aliases to the same value. Of course, maintaining this programming discipline is tricky.
equal-always and equal-now both must traverse at least the immutable part of data. Therefore, they take time proportional to the smaller datum (because if the two data are of different size, they must not be equal anyway, so there is no need to visit the extra data). The difference is that equal-always reduces to identical at references, thereby performing less computation than equal-now would.
- Use a quick check followed by a slower check only if necessary. For instance, suppose we want to speed up equal-always, and have reason to believe we will often compare identical elements and/or that the values being compared are very large. Then we might define:
fun my-eq(v1, v2) -> Boolean: identical(v1, v2) or equal-always(v1, v2) end
which has the following behavior:check: my-eq(b0, b1) is false my-eq(b1, b2) is true my-eq(l0, l1) is false my-eq(l1, l2) is true end
This is exactly the same as the behavior of equal-always, but faster when it can discharge the equality using identical without having to traverse the data. (Observe that this is a safe optimization because identical implies equal-always.) Use a different equality strategy entirely, if possible: see Set Membership by Hashing Redux.
12.6.3 Comparing Functions
We haven’t actually provided the full truth about equality because we
haven’t discussed functions. Defining equality for functions—
Because of this, most languages have tended to use approximations for function equality, most commonly reference equality. This is, however, a very weak approximation: even if the exact same function text in the same environment is allocated as two different closures, these would not be reference-equal. At least when this is done as part of the definition of identical, it makes sense; if other operators do this, however, they are actively lying, which is something the equality operators do not usually do.
There is one other approach we can take: simply disallow function comparison. This is what Pyret does: all three equality operators above will result in an error if you try to compare two functions. (You can compare against just one function, however, and you will get the answer false.) This ensures that the language’s comparison operators are never trusted falsely.
Pyret did have the choice of allowing reference equality for functions inside identical and erroring only in the other two cases. Had it done so, however, it would have violated the chain of implication above (A Hierarchy of Equality). The present design is arguably more elegant. Programmers who do want to use reference equality on functions can simply embed the functions inside a mutable structure like boxes.
There is one problem with erroring when comparing two functions: a completely generic procedure that compares two arbitrary values has to be written defensively. Because this is annoying, Pyret offers a three-valued version of each of the above three operators (identical3, equal-always3 and equal-now3), all of which return EqualityResult values that correspond to truth, falsity, and ignorance (returned in the case when both arguments are functions). Programmers can use this in place of the Boolean-valued comparison operators if they are uncertain about the types of the parameters.
13 Algorithms That Exploit State
13.1 Disjoint Sets Redux
Here’s how we can use this to implement union-find afresh. We will try to keep things as similar to the previous version (Checking Component Connectedness) as possible, to enhance comparison.
data Element: | elt(val, ref parent :: Option<Element>) end
fun is-in-same-set(e1 :: Element, e2 :: Element) -> Boolean: s1 = fynd(e1) s2 = fynd(e2) identical(s1, s2) end
fun update-set-with(child :: Element, parent :: Element): child!{parent: some(parent)} end
fun union(e1 :: Element, e2 :: Element): s1 = fynd(e1) s2 = fynd(e2) if identical(s1, s2): s1 else: update-set-with(s1, s2) end end
fun fynd(e :: Element) -> Element: cases (Option) e!parent: | none => e | some(p) => fynd(p) end
13.1.1 Optimizations
Look again at fynd. In the some case, the element bound to e is not the set name; that is obtained by recursively traversing parent references. As this value returns, however, we don’t do anything to reflect this new knowledge! Instead, the next time we try to find the parent of this element, we’re going to perform this same recursive traversal all over again.
fun fynd(e :: Element) -> Element: cases (Option) e!parent: | none => e | some(p) => new-parent = fynd(p) e!{parent: some(new-parent)} new-parent end end
There is one more interesting idea we can apply. This is to maintain a rank of each element, which is roughly the depth of the tree of elements for which that element is their set name. When we union two elements, we then make the one with larger rank the parent of the one with the smaller rank. This has the effect of avoiding growing very tall paths to set name elements, instead tending towards “bushy” trees. This too reduces the number of parents that must be traversed to find the representative.
13.1.2 Analysis
This optimized union-find data structure has a remarkble analysis. In
the worst case, of course, we must traverse the entire chain of
parents to find the name element, which takes time proportional to the
number of elements in the set. However, once we apply the above
optimizations, we never need to traverse that same chain again! In
particular, if we conduct an amortized analysis over a sequence
of set equality tests after a collection of union operations, we find
that the cost for subsequent checks is very small—
13.2 Set Membership by Hashing Redux
We have already seen solutions to set membership. First we saw how to
represent sets as lists (Representing Sets by Lists), then as
(balanced) binary trees (A Fine Balance: Tree Surgery).Don’t
confuse this with union-find, which is a different kind of problem on
sets (Disjoint Sets Redux). With this we were able to reduce
insertion and membership to logarithmic time in the number of
elements. Along the way, we also learned that the essence of using
these representations was to reduce any datatype to a comparable,
ordered element—
Let us now ask whether we can use these numbers in any other way. Suppose our set has only five elements, which map densely to the values between 0 and 4. We can then have a five element list of boolean values, where the boolean at each index of the list indicates whether the element corresponding to that position is in the set or not. Both membership and insertion, however, require traversing potentially the entire list, giving us solutions linear in the number of elements.
That’s not all. Unless we can be certain that there will be only five elements, we can’t be sure to bound the size of the representation. Also, we haven’t yet shown how to actually hash in a way that makes the representation dense; barring that, our space consumption gets much worse, in turn affecting time.
There is, actually, a relatively simple solution to the problem of reducing numbers densely to a range: given the hash, we apply modular arithmetic. That is, if we want to use a list of five elements to represent the set, we simply compute the hash’s modulo five. This gives us an easy solution to that problem.
[list: true, false, false, false, false]
[list: 5, false, false, false, false]
[list: [list: 5], empty, empty, empty, empty]
[list: [list: 5, 10], empty, empty, empty, empty]
Good; now we have another way of representing sets so we can check for membership. However, in the worst case one of those lists is going to contain all elements in the set, and we may have to traverse the entire list to find an element in it, which means membership testing will take time linear in the number of elements. Insertion, in turn, takes time proportional to the size of the modulus because we may have to traverse the entire outer list to get to the right sub-list.
Can we improve on this?
13.2.1 Improving Access Time
Given that we currently have no way of ensuring we won’t get hash collisions, for now we’re stuck with a list of elements at each position that could be the size of the set we are trying to represent. Therefore, we can’t get around that (yet). But, we’re currently paying time in the size of the outer list just to insert an element, and surely we can do better than that!
Accessing the nth element of an array takes constant, not linear, time in n. This is sometimes known as random-access, because it takes the same time to access any random element, as opposed to just a known element.
Arrays are updated by mutation. Thus, a change to an array is seen by all references to the array.
SIZE = 19 v = array-of(empty, SIZE)
fun find-bucket(n): num-modulo(n, SIZE) end
fun get-bucket(n): array-get-now(v, find-bucket(n)) end fun is-in(n): get-bucket(n).member(n) end
fun set-bucket(n, anew): array-set-now(v, find-bucket(n), anew) end fun put(n): when not(is-in(n)): set-bucket(n, link(n, get-bucket(n))) end end
What impact do duplicate elements have on the complexity of operations?
The data structure we have defined above is known as a hash table (which is a slightly confusing name, because it isn’t really a table of hashes, but this is the name used conventionally in computer science).
13.2.2 Better Hashing
Using arrays therefore appears to address one issue: insertion. Finding the relevant bucket takes constant time, linking the new element takes constant time, and so the entire operation takes constant time...except, we have to also check whether the element is already in the bucket, to avoid storing duplicates. We have gotten rid of the traversal through the outer list representing the set, but the member operation on the inner list remains unchanged. In principle it won’t, but in practice we can make it much better.
Note that collisions are virtually inevitable. If we have uniformly distributed data, then collisions show up sooner than we might expect.This follows from the reasoning behind what is known as the birthday problem, commonly presented as how many people need to be in a room before the likelihood that two of them share a birthday exceeds some percentage. For the likelihood to exceed half we need just 23 people! Therefore, it is wise to prepare for the possibility of collisions.
The key is to know something about the distribution of hash values. For instance, if we knew our hash values are all multiples of 10, then using a table size of 10 would be a terrible idea (because all elements would hash to the same bucket, turning our hash table into a list). In practice, it is common to use uncommon prime numbers as the table size, since a random value is unlikely to have it as a divisor. This does not yield a theoretical improvement (unless you can make certain assumptions about the input, or work through the math very carefully), but it works well in practice. In particular, since the typical hashing function uses memory addresses for objects on the heap, and on most systems these addresses are multiples of 4, using a prime like 31 is often a fairly good bet.
13.2.3 Bloom Filters
Another way to improve the space and time complexity is to relax the
properties we expect of the operations. Right now, set membership
gives perfect answers, in that it answers true exactly when the
element being checked was previously inserted into the set. But
suppose we’re in a setting where we can accept a more relaxed notion
of correctness, where membership tests can “lie” slightly in one
direction or the other (but not both, because that makes the
representation almost useless). Specifically, let’s say that “no
means no” (i.e., if the set representation says the element isn’t
present, it really isn’t) but “yes sometimes means no” (i.e., if the
set representation says an element is present, sometimes it
might not be). In short, if the set says the element isn’t in it, this
should be guaranteed; but if the set says the element is present,
it may not be. In the latter case, we either need some
other—
Where is such a data structure of use? Suppose we are building a Web site that uses password-based authentication. Because many passwords have been leaked in well-publicized breaches, it is safe to assume that hackers have them and will guess them. As a result, we want to not allow users to select any of these as passwords. We could use a hash-table to reject precisely the known leaked passwords. But for efficiency, we could use this imperfect hash instead. If it says “no”, then we allow the user to use that password. But if it says “yes”, then either they are using a password that has been leaked, or they have an entirely different password that, purely by accident, has the same hash value, but no matter; we can just disallow that password as well.A related use is for filtering out malicious Web sites. The URL shortening system, bitly, uses it for this purpose.
Another example is in updating databases or memory stores. Suppose we have a database of records, which we update frequently. It is often more efficient to maintain a journal of changes: i.e., a list that sequentially records all the changes that have occurred. At some interval (say overnight), the journal is “flushed”, meaning all these changes are applied to the database proper. But that means every read operation has become highly inefficient, because it has to check the entire journal first (for updates) before accessing the database. Again, here we can use this faulty notion of a hash table: if the hash of the record locator says “no”, then the record certainly hasn’t been modified and we go directly to the database; if it says “yes” then we have to check the journal.
We have already seen a simple example implementation of this idea earlier, when we used a single list (or array) of booleans, with modular arithmetic, to represent the set. When the set said 4 was not present, this was absolutely true; but when it said 5 and 10 are both present, only one of these was present. The advantage was a huge saving in space and time: we needed only one bit per bucket, and did not need to search through a list to answer for membership. The downside, of course, was a hugely inaccurate set data structure, and one with correlated failure tied to the modulus.
There is a simple way to improve this solution: instead of having just one array, have several (but a fixed number of them). When an element is added to the set, it is added to each array; when checking for membership, every array is consulted. The set only answers affirmatively to membership if all the arrays do so.
Naturally, using multiple arrays offers absolutely no advantage if the arrays are all the same size: since both insertion and lookup are deterministic, all will yield the same answer. However, there is a simple antidote to this: use different array sizes. In particular, by using array sizes that are relatively prime to one another, we minimize the odds of a clash (only hashes that are the product of all the array sizes will fool the array).
This data structure, called a Bloom Filter, is a probabilistic data structure. Unlike our earlier set data structure, this one is not guaranteed to always give the right answer; but contrary to the ☛ space-time tradeoff, we save both space and time by changing the problem slightly to accept incorrect answers. If we know something about the distribution of hash values, and we have some acceptable bound of error, we can design hash table sizes so that with high probability, the Bloom Filter will lie within the acceptable error bounds.
13.3 Avoiding Recomputation by Remembering Answers
We have on several instances already referred to a ☛ space-time tradeoff. The most obvious tradeoff is when a computation “remembers” prior results and, instead of recomputing them, looks them up and returns the answers. This is an instance of the tradeoff because it uses space (to remember prior answers) in place of time (recomputing the answer). Let’s see how we can write such computations.
13.3.1 An Interesting Numeric Sequence
Suppose we want to create properly-parenthesized expressions, and ignore all non-parenthetical symbols. How many ways are there of creating parenthesized expressions given a certain number of opening (equivalently, closing) parentheses?
If we have zero opening parentheses, the only expression we can create is the empty expression. If we have one opening parenthesis, the only one we can construct is “()” (there must be a closing parenthesis since we’re interested only in properly-parenthesized expressions). If we have two opening parentheses, we can construct “(())” and “()()”. Given three, we can construct “((()))”, “(())()”, “()(())”, “()()()”, and “(()())”, for a total of five. And so on. Observe that the solutions at each level use all the possible solutions at one level lower, combined in all the possible ways.
fun catalan(n): if n == 0: 1 else if n > 0: for fold(acc from 0, k from range(0, n)): acc + (catalan(k) * catalan(n - 1 - k)) end end end
check: |
catalan(0) is 1 |
catalan(1) is 1 |
catalan(2) is 2 |
catalan(3) is 5 |
catalan(4) is 14 |
catalan(5) is 42 |
catalan(6) is 132 |
catalan(7) is 429 |
catalan(8) is 1430 |
catalan(9) is 4862 |
catalan(10) is 16796 |
catalan(11) is 58786 |
end |
Check at what value you start to observe a significant slowdown on your machine. Plot the graph of running time against input size. What does this suggest?
The reason the Catalan computation takes so long is precisely because of what we alluded to earlier: at each level, we depend on computing the Catalan number of all the smaller levels; this computation in turn needs the numbers of all of its smaller levels; and so on down the road.
Map the subcomputations of catalan to see why the computation time explodes as it does. What is the asymptotic time complexity of this function?
13.3.1.1 Using State to Remember Past Answers
Therefore, this is clearly a case where trading space for time is likely to be of help. How do we do this? We need a notion of memory that records all previous answers and, on subsequent attempts to compute them, checks whether they are already known and, if so, just returns them instead of recomputing them.
What critical assumption is this based on?
Naturally, this assumes that for a given input, the answer will
always be the same. As we have seen, functions with state violate
this liberally, so typical stateful functions cannot utilize this
optimization. Ironically, we will use state to implement this
optimization, so we will have a stateful function that always returns
the same answer on a given input—
data MemoryCell: | mem(in, out) end var memory :: List<MemoryCell> = empty
fun catalan(n :: Number) -> Number: answer = find(lam(elt): elt.in == n end, memory) cases (Option) answer: | none => result = if n == 0: 1 else if n > 0: for fold(acc from 0, k from range(0, n)): acc + (catalan(k) * catalan(n - 1 - k)) end end memory := link({in: n, out: result}, memory) result | some(v) => v.out end end
This process, of converting a function into a version that remembers its past answers, is called memoization.
13.3.1.2 From a Tree of Computation to a DAG
What we have subtly done is to convert a tree of computation into a
DAG over the same computation, with equivalent calls being
reused. Whereas previously each call was generating lots of recursive
calls, which induced still more recursive calls, now we are reusing
previous recursive calls—
This has an important complexity benefit. Whereas previously we were
performing a super-exponential number of calls, now we perform only
one call per input and share all previous calls—
13.3.1.3 The Complexity of Numbers
As we start to run larger computations, however, we may start to
notice that our computations are starting to take longer than linear
growth. This is because our numbers are growing arbitrarily
large—
13.3.1.4 Abstracting Memoization
Now we’ve achieved the desired complexity improvement, but there is still something unsatisfactory about the structure of our revised definition of catalan: the act of memoization is deeply intertwined with the definition of a Catalan number, even though these should be intellectually distinct. Let’s do that next.
In effect, we want to separate our program into two parts. One part defines a general notion of memoization, while the other defines catalan in terms of this general notion.
data MemoryCell: | mem(in, out) end fun<T, U> memoize-1(f :: (T -> U)) -> (T -> U): var memory :: List<MemoryCell> = empty lam(n): answer = find(lam(elt): elt.in == n end, memory) cases (Option) answer: | none => result = f(n) memory := link({in: n, out: result}, memory) result | some(v) => v.out end end end
rec catalan :: (Number -> Number) = memoize-1( lam(n): if n == 0: 1 else if n > 0: for fold(acc from 0, k from range(0, n)): acc + (catalan(k) * catalan(n - 1 - k)) end end end)
We don’t write fun catalan(...): ...; because the procedure bound to catalan is produced by memoize-1.
Note carefully that the recursive calls to catalan have to be to the function bound to the result of memoization, thereby behaving like an object (Objects: Interpretation and Types). Failing to refer to this same shared procedure means the recursive calls will not be memoized, thereby losing the benefit of this process.
We need to use rec for reasons we saw earlier [Recursive Functions].
Each invocation of memoize-1 creates a new table of stored results. Therefore the memoization of different functions will each get their own tables rather than sharing tables, which is a bad idea!
Why is sharing memoization tables a bad idea? Be concrete.
13.3.2 Edit-Distance for Spelling Correction
Text editors, word processors, mobile phones, and various other devices now routinely implement spelling correction or offer suggestions on (mis-)spellings. How do they do this? Doing so requires two capabilities: computing the distance between words, and finding words that are nearby according to this metric. In this section we will study the first of these questions. (For the purposes of this discussion, we will not dwell on the exact definition of what a “word” is, and just deal with strings instead. A real system would need to focus on this definition in considerable detail.)
Think about how you might define the “distance between two words”. Does it define a metric space?
Will the definition we give below define a metric space over the set of words?
That the distance from a word to itself be zero.
That the distance from a word to any word other than itself be strictly positive. (Otherwise, given a word that is already in the dictionary, the “correction” might be a different dictionary word.)
That the distance between two words be symmetric, i.e., it shouldn’t matter in which order we pass arguments.
Observe that we have not included the triangle inequality relative to the properties of a metric. Why not? If we don’t need the triangle inequality, does this let us define more interesting distance functions that are not metrics?
we left out a character;
we typed a character twice; or,
we typed one character when we meant another.
There are several variations of this definition possible. For now, we will consider the simplest one, which assumes that each of these errors has equal cost. For certain input devices, we may want to assign different costs to these mistakes; we might also assign different costs depending on what wrong character was typed (two characters adjacent on a keyboard are much more likely to be a legitimate error than two that are far apart). We will return briefly to some of these considerations later (Nature as a Fat-Fingered Typist).
check: |
levenshtein(empty, empty) is 0 |
levenshtein([list:"x"], [list: "x"]) is 0 |
levenshtein([list: "x"], [list: "y"]) is 1 |
# one of about 600 |
levenshtein( |
[list: "b", "r", "i", "t", "n", "e", "y"], |
[list: "b", "r", "i", "t", "t", "a", "n", "y"]) |
is 3 |
# http://en.wikipedia.org/wiki/Levenshtein_distance |
levenshtein( |
[list: "k", "i", "t", "t", "e", "n"], |
[list: "s", "i", "t", "t", "i", "n", "g"]) |
is 3 |
levenshtein( |
[list: "k", "i", "t", "t", "e", "n"], |
[list: "k", "i", "t", "t", "e", "n"]) |
is 0 |
# http://en.wikipedia.org/wiki/Levenshtein_distance |
levenshtein( |
[list: "S", "u", "n", "d", "a", "y"], |
[list: "S", "a", "t", "u", "r", "d", "a", "y"]) |
is 3 |
# http://www.merriampark.com/ld.htm |
levenshtein( |
[list: "g", "u", "m", "b", "o"], |
[list: "g", "a", "m", "b", "o", "l"]) |
is 2 |
# http://www.csse.monash.edu.au/~lloyd/tildeStrings/Alignment/92.IPL.html |
levenshtein( |
[list: "a", "c", "g", "t", "a", "c", "g", "t", "a", "c", "g", "t"], |
[list: "a", "c", "a", "t", "a", "c", "t", "t", "g", "t", "a", "c", "t"]) |
is 4 |
levenshtein( |
[list: "s", "u", "p", "e", "r", "c", "a", "l", "i", |
"f", "r", "a", "g", "i", "l", "i", "s", "t" ], |
[list: "s", "u", "p", "e", "r", "c", "a", "l", "y", |
"f", "r", "a", "g", "i", "l", "e", "s", "t" ]) |
is 2 |
end |
rec levenshtein :: (List<String>, List<String> -> Number) = |
if is-empty(s) and is-empty(t): 0 |
else if is-empty(s): t.length() |
else if is-empty(t): s.length() |
else: |
if s.first == t.first: |
levenshtein(s.rest, t.rest) |
else: |
min3( |
1 + levenshtein(s.rest, t), |
1 + levenshtein(s, t.rest), |
1 + levenshtein(s.rest, t.rest)) |
end |
end |
fun min3(a :: Number, b :: Number, c :: Number): num-min(a, num-min(b, c)) end
This algorithm will indeed pass all the tests we have written above, but with a problem: the running time grows exponentially. That is because, each time we find a mismatch, we recur on three subproblems. In principle, therefore, the algorithm takes time proportional to three to the power of the length of the shorter word. In practice, any prefix that matches causes no branching, so it is mismatches that incur branching (thus, confirming that the distance of a word with itself is zero only takes time linear in the size of the word).
Observe, however, that many of these subproblems are the same. For instance, given “kitten” and “sitting”, the mismatch on the initial character will cause the algorithm to compute the distance of “itten” from “itting” but also “itten” from “sitting” and “kitten” from “itting”. Those latter two distance computations will also involve matching “itten” against “itting”. Thus, again, we want the computation tree to turn into a DAG of expressions that are actually evaluated.
data MemoryCell2<T, U, V>: | mem(in-1 :: T, in-2 :: U, out :: V) end fun<T, U, V> memoize-2(f :: (T, U -> V)) -> (T, U -> V): var memory :: List<MemoryCell2<T, U, V>> = empty lam(p, q): answer = find( lam(elt): (elt.in-1 == p) and (elt.in-2 == q) end, memory) cases (Option) answer: | none => result = f(p, q) memory := link({in-1: p, in-2: q, out: result}, memory) result | some(v) => v.out end end end
rec levenshtein :: (List<String>, List<String> -> Number) = |
memoize-2( |
lam(s, t): |
if is-empty(s) and is-empty(t): 0 |
else if is-empty(s): t.length() |
else if is-empty(t): s.length() |
else: |
if s.first == t.first: |
levenshtein(s.rest, t.rest) |
else: |
min3( |
1 + levenshtein(s.rest, t), |
1 + levenshtein(s, t.rest), |
1 + levenshtein(s.rest, t.rest)) |
end |
end |
end) |
The complexity of this algorithm is still non-trivial. First, let’s introduce the term suffix: the suffix of a string is the rest of the string starting from any point in the string. (Thus “kitten”, “itten”, “ten”, “n”, and “” are all suffixes of “kitten”.) Now, observe that in the worst case, starting with every suffix in the first word, we may need to perform a comparison against every suffix in the second word. Fortunately, for each of these suffixes we perform a constant computation relative to the recursion. Therefore, the overall time complexity of computing the distance between strings of length \(m\) and \(n\) is \(O(m \cdot n)\). (We will return to space consumption later [Contrasting Memoization and Dynamic Programming].)
Modify the above algorithm to produce an actual (optimal) sequence of edit operations. This is sometimes known as the traceback.
13.3.3 Nature as a Fat-Fingered Typist
We have talked about how to address mistakes made by humans. However, humans are not the only bad typists: nature is one, too!
When studying living matter we obtain sequences of amino acids and
other such chemicals that comprise molecules, such as DNA, that hold
important and potentially determinative information about the
organism. These sequences consist of similar fragments that we wish to
identify because they represent relationships in
the organism’s behavior or evolution.This section may
need to be skipped in
some states and countries.
Unfortunately, these sequences are never identical: like all
low-level programmers, nature slips up and sometimes makes mistakes in
copying (called—
The only difference between traditional presentations Levenshtein and Smith-Waterman is something we alluded to earlier: why is every edit given a distance of one? Instead, in the Smith-Waterman presentation, we assume that we have a function that gives us the gap score, i.e., the value to assign every character’s alignment, i.e., scores for both matches and edits, with scores driven by biological considerations. Of course, as we have already noted, this need is not peculiar to biology; we could just as well use a “gap score” to reflect the likelihood of a substitution based on keyboard characteristics.
13.3.4 Dynamic Programming
We have used memoization as our canonical means of saving the values of past computations to reuse later. There is another popular technique for doing this called dynamic programming. This technique is closely related to memoization; indeed, it can be viewed as the dual method for achieving the same end. First we will see dynamic programming at work, then discuss how it differs from memoization.
Dynamic programming also proceeds by building up a memory of answers, and looking them up instead of recomputing them. As such, it too is a process for turning a computation’s shape from a tree to a DAG of actual calls. The key difference is that instead of starting with the largest computation and recurring to smaller ones, it starts with the smallest computations and builds outward to larger ones.
We will revisit our previous examples in light of this approach.
13.3.4.1 Catalan Numbers with Dynamic Programming
MAX-CAT = 11 answers :: Array<Option<Number>> = array-of(none, MAX-CAT + 1)
fun catalan(n): cases (Option) array-get-now(answers, n): | none => raise("looking at uninitialized value") | some(v) => v end end
fun fill-catalan(upper): array-set-now(answers, 0, some(1)) when upper > 0: for map(n from range(1, upper + 1)): block: cat-at-n = for fold(acc from 0, k from range(0, n)): acc + (catalan(k) * catalan(n - 1 - k)) end array-set-now(answers, n, some(cat-at-n)) end end end end fill-catalan(MAX-CAT)
Notice that we have had to undo the natural recursive
definition—
13.3.4.2 Levenshtein Distance and Dynamic Programming
fun levenshtein(s1 :: List<String>, s2 :: List<String>): |
end |
s1-len = s1.length() |
s2-len = s2.length() |
answers = array2d(s1-len + 1, s2-len + 1, none) |
fun put(s1-idx :: Number, s2-idx :: Number, n :: Number): |
answers.set(s1-idx, s2-idx, some(n)) |
end |
fun lookup(s1-idx :: Number, s2-idx :: Number) -> Number: |
a = answers.get(s1-idx, s2-idx) |
cases (Option) a: |
| none => raise("looking at uninitialized value") |
| some(v) => v |
end |
end |
for each(s1i from range(0, s1-len + 1)): |
put(s1i, 0, s1i) |
end |
for each(s2i from range(0, s2-len + 1)): |
put(0, s2i, s2i) |
end |
for each(s1i from range(0, s1-len)): |
for each(s2i from range(0, s2-len)): |
end |
end |
Is this strictly true?
No, it isn’t. We did first fill in values for the “borders” of the table. This is because doing so in the midst of <levenshtein-dp/compute-dist> would be much more annoying. By initializing all the known values, we keep the core computation cleaner. But it does mean the order in which we fill in the table is fairly complex.
dist = |
if index(s1, s1i) == index(s2, s2i): |
lookup(s1i, s2i) |
else: |
min3( |
1 + lookup(s1i, s2i + 1), |
1 + lookup(s1i + 1, s2i), |
1 + lookup(s1i, s2i)) |
end |
put(s1i + 1, s2i + 1, dist) |
lookup(s1-len, s2-len) |
fun levenshtein(s1 :: List<String>, s2 :: List<String>): s1-len = s1.length() s2-len = s2.length() answers = array2d(s1-len + 1, s2-len + 1, none) for each(s1i from range(0, s1-len + 1)): put(s1i, 0, s1i) end for each(s2i from range(0, s2-len + 1)): put(0, s2i, s2i) end for each(s1i from range(0, s1-len)): for each(s2i from range(0, s2-len)): dist = if index(s1, s1i) == index(s2, s2i): lookup(s1i, s2i) else: min3( 1 + lookup(s1i, s2i + 1), 1 + lookup(s1i + 1, s2i), 1 + lookup(s1i, s2i)) end put(s1i + 1, s2i + 1, dist) end end lookup(s1-len, s2-len) end
13.3.5 Contrasting Memoization and Dynamic Programming
Now that we’ve seen two very techniques for avoiding recomputation, it’s worth contrasting them. The important thing to note is that memoization is a much simpler technique: write the natural recursive definition; determine its space complexity; decide whether this is problematic enough to warrant a space-time trade-off; and if it is, apply memoization. The code remains clean, and subsequent readers and maintainers will be grateful for that. In contrast, dynamic programming requires a reorganization of the algorithm to work bottom-up, which can often make the code harder to follow and full of subtle invariants about boundary conditions and computation order.
That said, the dynamic programming solution can sometimes be more computationally efficient. For instance, in the Levenshtein case, observe that at each table element, we (at most) only ever use the ones that are from the previous row and column. That means we never need to store the entire table; we can retain just the fringe of the table, which reduces space to being proportional to the sum, rather than product, of the length of the words. In a computational biology setting (when using Smith-Waterman), for instance, this saving can be substantial. This optimization is essentially impossible for memoization.
Memoization |
| Dynamic Programming |
Top-down |
| Bottom-up |
Depth-first |
| Breadth-first |
Black-box |
| Requires code reorganization |
All stored calls are necessary |
| May do unnecessary computation |
Cannot easily get rid of unnecessary data |
| Can more easily get rid of unnecessary data |
Can never accidentally use an uninitialized answer |
| Can accidentally use an uninitialized answer |
Needs to check for the presence of an answer |
| Can be designed to not need to check for the presence of an answer |
From a software design perspective, there are two more considerations.
First, the performance of a memoized solution can trail that of dynamic programming when the memoized solution uses a generic data structure to store the memo table, whereas a dynamic programming solution will invariably use a custom data structure (since the code needs to be rewritten against it anyway). Therefore, before switching to dynamic programming for performance reasons, it makes sense to try to create a custom memoizer for the problem: the same knowledge embodied in the dynamic programming version can often be encoded in this custom memoizer (e.g., using an array instead of list to improve access times). This way, the program can enjoy speed comparable to that of dynamic programming while retaining readability and maintainability.
Second, suppose space is an important consideration and the dynamic programming version can make use of significantly less space. Then it does make sense to employ dynamic programming instead. Does this mean the memoized version is useless?
What do you think? Do we still have use for the memoized version?
Yes, of course we do! It can serve as an oracle [REF] for the dynamic
programming version, since the two are supposed to produce identical
answers anyway—
In short, always first produce the memoized version. If you need more performance, consider customizing the memoizer’s data structure. If you need to also save space, and can arrive at a more space-efficient dynamic programming solution, then keep both versions around, using the former to test the latter (the person who inherits your code and needs to alter it will thank you!).
We have characterized the fundamental difference between memoization and dynamic programming as that between top-down, depth-first and bottom-up, breadth-first computation. This should naturally raise the question, what about:
top-down, breadth-first
bottom-up, depth-first
orders of computation. Do they also have special names that we just happen to not know? Are they uninteresting? Or do they not get discussed for a reason?
14 [EMPTY]
15 Processing Programs: Parsing
15.1 Understanding Languages by Writing Programs About Them
An interpreter will consume programs in a language and produce the answers they are expected to produce.
A type checker will consume programs in a language and produce either true or false, depending on whether the program has consistent type annotations.
A pretty-printer will consume programs in a language and print them, prettified in some way.
A verifier will consume programs in a language and check whether they satisfy some stated property.
A transformer will consume programs in a language and produce related but different programs in the same language.
A transformer’s first cousin, a compiler, will consume programs in a language and produce related programs in a different language (which in turn can be interpreted, type-checked, pretty-printed, verified, transformed, even compiled...).
15.2 Everything (We Will Say) About Parsing
☛ Parsing is a very general actvity whose difficulty depends both on how complex or ambiguous the input might be, and how much stucture we expect of the parser’s output. For our purposes, we would like the parser to be maximally helpful by providing later stages as much structure as possible. This forces us to either write a very complex parser or limit the forms of legal input. We will choose the latter.
23 + 5 * 6 |
Ultimately, we would like to get rid of ambiguity once-and-for-all at the very beginning of processing the program, rather than deal with it repeatedly in each of the ways we might want to process it. Thus, if we follow the standard rules of arithmetic, we would want the above program to turn into a tree that has a (representation of) addition at its root, a (representation of) 23 as its left child, multiplication as its right child, and so on. This is called an abstract syntax tree: it is “abstract” because it represents the intent of the program rather than its literal syntactic structure (spaces, indentation, etc.); it is “syntax” because it represents the program that was given; and it is usually a “tree” but not always.
<plus> |
<args> |
<arg position="1"> |
<number value="23"/> |
</arg> |
<arg position="2"> |
<mult> |
<args> |
<arg position="1"> |
<number value="5"/> |
</arg> |
<arg position="2"> |
<number value="6"/> |
</arg> |
</args> |
</mult> |
</arg> |
<args> |
</plus> |
{plus: |
[{number: "23"}, |
{mult: |
[{number: "5"}, |
{number: "6"}]}]} |
15.2.1 A Lightweight, Built-In First Half of a Parser
(+ 23 (* 5 6)) |
Load the s-expression library withimport s-exp as S
and then try the following:S.read-s-exp("(+ 23 (* 5 6))")
Make sure you understand the output it produced and why it produced that.
check: S.read-s-exp("(+ 23 (* 5 6))") is S.s-list([list: S.s-sym("+"), S.s-num(23), S.s-list([list: S.s-sym("*"), S.s-num(5), S.s-num(6)])]) end
In this book we will use s-expressions to represent concrete syntax. This is helpful because the syntax is so different from that of Pyret, we will virtually never be confused as to what language we are reading. Since we will be writing programs to process programs, it is especially helpful to keep apart the program being processed and that doing the processing. For us, the former will be written in s-expressions and the latter in Pyret.
15.2.2 Completing the Parser
In principle, we can think of read-s-exp as a complete parser. However, its output is generic: it represents the token structure without offering any comment on its intent. We would instead prefer to have a representation that tells us something about the intended meaning of the terms in our language, just as we wrote at the very beginning: “(representation of) multiplication”, and so on.
import s-exp as S import lists as L
data ArithC: | numC(n :: Number) | plusC(l :: ArithC, r :: ArithC) | multC(l :: ArithC, r :: ArithC) end
fun parse(s :: S.S-Exp) -> ArithC: cases (S.S-Exp) s: | s-num(n) => numC(n) | s-list(shadow s) => cases (List) s: | empty => raise("parse: unexpected empty list") | link(op, args) => argL = L.index(args, 0) argR = L.index(args, 1) if op.s == "+": plusC(parse(argL), parse(argR)) else if op.s == "*": multC(parse(argL), parse(argR)) end end | else => raise("parse: not number or list") end end
check: fun p(s): parse(S.read-s-exp(s)) end p("3") is numC(3) p("(+ 1 2)") is plusC(numC(1), numC(2)) p("(* (+ 1 2) (* 2 5))") is multC(plusC(numC(1), numC(2)), multC(numC(2), numC(5))) end
Congratulations! You have just completed your first representation of a program. From now on we can focus entirely on programs represented as recursive trees, ignoring the vagaries of surface syntax and how to get them into the tree form (though in practice, we will continue to use the s-expression notation because it’s easier to type than all those constructors). We’re finally ready to start studying programming languages!
If the testp("3") is numC(3)
is instead written asp(3) is numC(3)
what happens? Why?
15.2.3 Coda
The s-expression syntax dates back to 1960.“Recursive functions of symbolic expressions and their computation by machine, Part I” by John McCarthy in Communications of the ACM. This syntax is often controversial amongst programmers. Observe, however, something deeply valuable that it gives us. While parsing traditional languages can be very complex, parsing this syntax is virtually trivial. Given a sequence of tokens corresponding to the input, it is absolutely straightforward to turn parenthesized sequences into s-expressions; it is equally straightforward (as we see above) to turn s-expressions into proper syntax trees. I like to call such two-level languages bicameral, in loose analogy to government legislative houses: the lower-level does rudimentary well-formedness checking, while the upper-level does deeper validity checking. (We haven’t done any of the latter yet, but we will [REF].)
The virtues of this syntax are thus manifold. The amount of code it
requires is small, and can easily be embedded in many contexts. By
integrating the syntax into the language, it becomes easy for programs
to manipulate representations of programs (as we will see more of in
[REF]). It’s therefore no surprise that even though many Lisp-based
languages—
Of course, we could just use XML instead. That might be much nicer. Or JSON. Because that wouldn’t be anything like an s-expression at all.
16 Processing Programs: A First Look at Interpretation
Now we’re ready to write an evaluator—
16.1 Representing Arithmetic
Let’s first agree on how we will represent arithmetic expressions.
Let’s say we want to support only two operations—
Why did we not include division? What impact does it have on the remarks above?
data ArithC: | numC(n :: Number) | plusC(l :: ArithC, r :: ArithC) | multC(l :: ArithC, r :: ArithC) end
16.2 Writing an Interpreter
Now let’s write an interpreter for this arithmetic language. First,
we should think about what its type is. It clearly consumes a
ArithC value. What does it produce? Well, an interpreter
evaluates—
Write your examples for the interpreter.
Templates are explained in detail in How to Design Programs.
fun interp(e :: ArithC) -> Number: cases (ArithC) e: | numC(n) => ... | plusC(l, r) => ... | multC(l, r) => ... end end
fun interp(e :: ArithC) -> Number: cases (ArithC) e: | numC(n) => n | plusC(l, r) => l + r | multC(l, r) => l * r end where: interp(numC(3)) is 3 end
Do you spot the errors?
fun interp(e :: ArithC) -> Number: cases (ArithC) e: | numC(n) => ... | plusC(l, r) => ... interp(l) ... interp(r) ... | multC(l, r) => ... interp(l) ... interp(r) ... end end
fun interp(e :: ArithC) -> Number: cases (ArithC) e: | numC(n) => n | plusC(l, r) => interp(l) + interp(r) | multC(l, r) => interp(l) * interp(r) end end
Later on (Functions Anywhere), we’re going to wish we had returned a more complex datatype than just numbers. But for now, this will do.
Congratulations: you’ve written your first interpreter! I know, it’s
very nearly an anticlimax. But they’ll get harder—
16.3 A First Taste of “Semantics”
I just slipped something by you:
What is the “meaning” of addition and multiplication in this new language?
That’s a pretty abstract question, isn’t it. Let’s make it concrete. I’ll pose the problem as follows.
1 + 2
1 + 2
’1’ + ’2’
’1’ + ’2’
First of all, there are many different kinds of numbers: fixed-width (e.g., 32-bit) integers, signed fixed-width (e.g., 31-bits plus a sign-bit) integers, arbitrary precision integers; in some languages, rationals; various formats of fixed- and floating-point numbers; in some languages, complex numbers; and so on. After the numbers have been chosen, addition may support only some combinations of them.
In addition, some languages permit the addition of datatypes such as matrices.
Furthermore, many languages support “addition” of strings (we use scare-quotes because we don’t really mean the mathematical concept of addition, but rather the operation performed by an operator with the syntax +). In some languages this always means concatenation; in some others, it can result in numeric results (or numbers stored in strings).
Returning to our interpreter, what semantics do we have? We’ve adopted whatever semantics Pyret provides, because we map + to Pyret’s +. In fact that’s not even quite true: Pyret may, for all we know, also enable + to apply to strings (which in fact it does), so we’ve chosen the restriction of Pyret’s semantics to numbers.
In what way have we restricted + to apply only to numbers? Where exactly is this restriction?
If we wanted a different semantics, we’d have to implement it explicitly.
What all would you have to change so that the number had signed 32-bit arithmetic?
In general, we have to be careful about too readily borrowing from the host language. We’ll return to this topic later [REF]. However, because we have lots of interesting things to study already, we will adopt Pyret’s numbers as our numbers for now.
16.4 Desugaring: Growing the Language Without Enlarging It
We’ve picked a very restricted first language, so there are many ways we can grow it. Some, such as representing data structures and functions, will clearly force us to add new features to the interpreter itself. Others, such as adding more of arithmetic itself, can possibly be done without disturbing the core language and hence its interpreter: this is known as adding syntactic sugar, or “sugar” for short. Let’s investigate.
16.4.1 Extension: Binary Subtraction
First, we’ll add subtraction. Because our language already has numbers, addition, and multiplication, it’s easy to define subtraction: \(a - b = a + -1 \times b\).
Okay, that was easy! But now we should turn this into concrete code. To do so, we face a decision: where does this new subtraction operator reside? It is tempting, and perhaps seems natural, to just add one more case to our existing ArithC datatype.
What are the negative consequences of modifying ArithC?
The first, obvious, one is that we now have to modify all programs that process ArithC. So far that’s only our interpreter, which is pretty simple, but in a more complex implementation, there could be many programs built around the datatype—
a type-checker, compiler, etc.— which must all be changed, creating a heavy burden. Second, we were trying to add new constructs that we can define in terms of existing ones; it feels slightly self-defeating to do this in a way that isn’t modular.
Third, and most subtly, there’s something conceptually unnecessary about modifying ArithC. That’s because ArithC represents a perfectly good core language. Atop this, we might want to include any number of additional operations that make the user’s life more convenient, but there’s no need to put these in the core. Rather, it’s wise to record conceptually different ideas in distinct datatypes, rather than shoehorn them into one. The separation can look a little unwieldy sometimes, but it makes the program much easier for future developers to read and maintain. Besides, for different purposes you might want to layer on different extensions, and separating the core from the surface enables that.
data ArithExt: |
| numExt (n :: Number) |
| plusExt (l :: ArithExt, r :: ArithExt) |
| multExt (l :: ArithExt, r :: ArithExt) |
| bminusExt (l :: ArithExt, r :: ArithExt) |
end |
What happens if the children are declared to be ArithC rather than ArithExt?
Given this datatype, we should do two things. First, we should modify our parser to also parse - expressions, and always construct ArithExt terms (rather than any ArithC ones). Second, we should implement a desugar function that translates ArithExt values into ArithC ones.Desugaring is the act of removing syntactic sugar.
fun desugar(s :: ArithExt) -> ArithC: |
cases (ArithExt) s: |
| numExt(n) => numC(n) |
| plusExt(l, r) => plusC(desugar(l), desugar(r)) |
| multExt(l, r) => multC(desugar(l), desugar(r)) |
end |
end |
| bminusExt(l, r) => |
plusC(desugar(l), multC(numC(-1), desugar(r))) |
It’s a common mistake to forget the recursive calls to desugar on l and r. What happens when you forget them? Try for yourself and see.
16.4.2 Extension: Unary Negation
Now let’s consider another extension, which is a little more interesting: unary negation. This forces you to do a little more work in the parser because, depending on your surface syntax, you may need to look ahead to determine whether you’re in the unary or binary case. But that’s not even the interesting part!
Modify parse to handle unary subtraction.
There are many ways we can desugar unary negation. We can define it naturally as \(-b = 0 - b\), or we could abstract over the desugaring of binary subtraction with this expansion: \(-b = 0 + -1 \times b\).
Which one do you prefer? Why?
| uminusExt (e :: ArithExtU) |
| uminusExt(e) => desugar(bminusExt(numExt(0), e)) |
- The first is that the recursion is generative, which forces us to take extra care.If you haven’t heard of generative recursion before, read the section on it in How to Design Programs. Essentially, in generative recursion the sub-problem is a computed function of the input, rather than a structural piece of it. This is an especially simple case of generative recursion, because the “function” is simple: it’s just the bminusExt constructor. We might be tempted to fix this by using a different rewrite:
| uminusExt(e) => bminusExt(numExt(0), desugar(e))
which does indeed eliminate the generativity.Do Now!Unfortunately, this desugaring transformation won’t work at all! Do you see why? If you don’t, try to run it.
The second is that we are implicitly depending on exactly what bminusExt means; if its meaning changes, so will that of uminusExt, even if we don’t want it to. In contrast, defining a functional abstraction that consumes two terms and generates one representing the addition of the first to -1 times the second, and using this to define the desugaring of both uminusExt and bminusExt, is a little more fault-tolerant.
You might say that the meaning of subtraction is never going to change, so why bother? Yes and no. Yes, it’s meaning is unlikely to change; but no, its implementation might. For instance, the developer may decide to log all uses of binary subtraction. In the first expansion all uses of unary negation would also get logged, but they would not in the second expansion.
Fortunately, in this particular case we have a much simpler option, which is to define \(-b = -1 \times b\). This expansion works with the primitives we have, and follows structural recursion. The reason we took the above detour, however, is to alert you to these problems, and warn that you might not always be so fortunate.
16.5 A Three-Stage Pipeline
This concludes our first look at the standard pipeline we’re going to use. We will first parse programs to convert them to abstract syntax; we will then desugar them to eliminate unnecessary constructs. From now on, we will usually focus just on the resulting core language, which will be subject to not only interpretation but also type-checking and other actions.
17 Interpreting Conditionals
Now that we have the first stirrings of a programming language, let’s
grow it out a little. The heart of a programming language consists of
control—
17.1 The Design Space of Conditionals
(if test-exp then-part else-part) |
What kind of values can the test-exp be? In some languages they must be Boolean values (two values, one representing truth and the other falsehood). In other languages this expression can evaluate to just about any value, with some set—
colloquially called truthy— representing truth (i.e., they result in execution of the then-part) while the remaining ones are falsy, meaning they cause else-part to run. Initially, it may seem attractive to design a language with several truthy and falsy values: after all, this appears to give the programmer more convenience, permitting non-Boolean-valued functions and expressions to be used in conditionals. However, this can lead to bewildering inconsistencies across languages:Value
JS
Perl
PHP
Python
Ruby
0
falsy
falsy
falsy
falsy
truthy
""
falsy
falsy
falsy
falsy
truthy
NaN
falsy
truthy
truthy
truthy
truthy
nil/null/None/undef
falsy
falsy
falsy
falsy
falsy
"0"
truthy
falsy
falsy
truthy
truthy
-1
truthy
truthy
truthy
truthy
truthy
[]
truthy
truthy
falsy
falsy
truthy
empty map/object
truthy
falsy
falsy
falsy
truthy
Of course, it need not be so complex. Scheme, for instance, has only one value that is falsy: false itself (written as #f). Every other value is truthy. For those who value allowing non-Boolean values in conditionals, this represents an elegant trade-off: it means a function need not worry that a type-consistent value resulting from a computation might cause a conditional to reverse itself. (For instance, if a function returns strings, it need not worry that the empty string might be treated differently from every other string.)While writing this chapter, I stumbled on a strange bug in Pyret: all numeric s-expressions parsed as s-num values except 0, which parsed as a s-sym. Eventually Justin Pombrio reported: “It’s a silly bug with a if in JavaScript that’s getting 0 and thinking it’s false.” Note that Ruby and Lua have relatively few falsy values; it may not be coincidental that their creators were deeply influenced by Scheme.
What kind of terms are the branches? Some languages make a distinction between statements and expressions; in such languages, designers need to decide which of these are permitted. In some languages, there are even two syntactic forms of conditional to reflect these two choices: e.g., in C, if uses statements (and does not return any value) while the “ternary operator” ((...?...:...)) permits expressions and returns a value.
If the branches are expressions and hence allowed to evaluate to values, how do the values relate? Many (but not all) languages with static type systems expect the two branches to have the same type [REF]. Languages without static type systems usually place no restrictions.
For now, we will assume that the conditional expression can only be a Boolean value; the branches are expressions (because that is all we have in our language anyway); and the two branches can return values of different types.
17.2 The Game Plan for Conditionals
- First, we need to define syntax. We’ll use
true
false
(if test-exp then-exp else-exp)
to represent the two Boolean constants and the conditional expression. - We need to modify the representation of programs to handle these new constructs. Here’s our new expression language (with the name adjusted to signal its growth beyond pure arithmetic):
data ExprC: | trueC | falseC | numC(n :: Number) | plusC(l :: ExprC, r :: ExprC) | multC(l :: ExprC, r :: ExprC) | ifC(c :: ExprC, t :: ExprC, e :: ExprC) end
We need to adjust the pre-desugaring language (ExprExt) as well to account for the new constructs. - We need to modify the parser and desugarer.Exercise
Modify parse and desugar to work with the extended language. Adjust the datatypes as needed. Be sure to write tests.
There’s one more big change needed. Do you see what it is?
17.2.1 The Interpreter’s Type
trueC
It precludes being able to have a language with pure Booleans. This will have consequences when we get to types [REF].
It means you can perform arithmetic on truth values. This might not sound so surprising: after all, conjunction (and) and disjunction (or) can, after all, be thought in terms of arithmetic. But once you you say truth values are numbers, you can no longer detect if a programmer accidentally subtracts one truth value from another, divides them, and so on.
It isn’t even clear which numbers should represent which truth values. Historically, some languages have made zero represent truth; others have even chosen to use non-negative numbers for truth and negative numbers for falsity. None of these choices is more clearly “correct” than other ones, which suggests we’re really just guessing our way around here.
Most of all, we can’t keep hacking our way out of this situation. How are we going to represent strings or lists? With Gödel numbering? What happens when we get to functions, coroutines, continuations?
The consequence of this decisionOf course, you’re welcome to experiment with different decisions. The beauty of writing little interpreters is you can change what you want and explore the consequences of those changes. is that we will need a way to represent all the possible outcomes from the interpreter.
Try to sketch a representation for yourself.
data Value: | numV(n :: Number) | boolV(b :: Boolean) end
17.2.2 Updating Arithmetic
Finally, we’re ready to augment our interpreter. We can ignore the arithmetic lines, which should be unchanged (because we haven’t changed anything about how we will perform arithmetic), and focus on the new parts of the language.
Right?
Wrong. Because we’ve changed the type of value the interpreter produces, we have to update the rules for arithmetic, too, to reflect that. We can do this quickly, but we’ll do it in a few steps to illustrate a point.
fun interp(e :: ExprC) -> Value: |
cases (ExprC) e: |
| numC(n) => numV(n) |
end |
end |
| plusC(l, r) => numV(interp(l).n + interp(r).n)
17.2.3 Defensive Programming
interp(l).n
fun arith-binop(op :: (Number, Number -> Number), l :: ExprC, r :: ExprC) -> Value: l-v = interp(l) r-v = interp(r) if is-numV(l-v) and is-numV(r-v): numV(op(l-v.n, r-v.n)) else: raise('argument not a number') end end
| plusC(l, r) => arith-binop(lam(x, y): x + y end, l, r) |
| multC(l, r) => arith-binop(lam(x, y): x * y end, l, r) |
Before we move on, let’s ponder one more question. Suppose we could be certain that no other variant would have a field named n. Then, is there any difference between the version that checks is-numV for each of the values and the version that does not?
The version that performs a check in arith-binop is providing the error at the level of the language being implemented. It does not depend on Pyret to perform any checks; furthermore, it can give an error in terms of the interpreted language, using terminology that makes sense to the programmer in that language.
In contrast, the version that delegates the check to Pyret is allowing a meta-error to percolate through. This requires being very certain of how Pyret works, whether it will perform the check at the right time and in the right place, and then halt program execution in the appropriate way. Furthermore, the error message it produces might make no sense to the programmer: Pyret might say “Field n not found”, but to a person using a language with only arithmetic and conditionals, the very term “field” might mean nothing.
17.2.4 Interpreting Conditionals
| trueC => boolV(true) |
| falseC => boolV(false) |
| ifC(cnd, thn, els) => |
ic = interp(cnd) |
if is-boolV(ic): |
if ic.b: |
interp(thn) |
else: |
interp(els) |
end |
else: |
raise('not a boolean') |
end |
17.3 Growing the Conditional Language
A way to compute Boolean values, not just write them as constants. For instance, we should add operations on numbers (such as numeric comparison). This is relatively easy, especially given that we already have arith-binop parameterized over the operation to perform and returning a Value (rather than a number). The bigger nuisance is pushing this through parsing and desugaring. It would instead be better to create generic unary and binary operations and look them up in a table.
It would also be useful to have a way to combine conditionals (negation, disjunction, conjunction).
Generalize the parser and desugarer to look up a table of unary and binary operations and represent them uniformly, instead of having a different variant for each one.
(and (not (= x 0)) (/ 1 x)) |
(if (not (= x 0)) |
false |
(/ 1 0)) |
Implement negation, conjunction, and disjunction.
Define a multi-armed conditional expression that desugars into nested ifs.
18 Interpreting Functions
18.1 Adding Functions to the Language
Now that we have basic expressions and conditionals, let’s grow to have a complete programming languageby adding functions.
18.1.1 Defining Data Representations
A set of definitions suggests no ordering, which means, presumably, any definition can refer to any other. That’s what I intend here, but when you are designing your own language, be sure to think about this.
fun double(x): x + x end fun quad(x): double(double(x)) end fun const5(_): 5 end
When a function has multiple arguments, what simple but important criterion governs the names of those arguments?
data FunDefC: | fdC(name :: String, arg :: String, body :: ExprC) end
What is the body? Clearly, it has the form of an arithmetic expression, and sometimes it can even be represented using the existing ArithC language: for instance, the body of const5 can be represented as numC(5). But representing the body of double requires something more: not just addition (which we have), but also “x”. You are probably used to calling this a variable, but we will not use that term for now. Instead, we will call it an identifier.We’ve discussed this terminological difference in From Identifiers to Variables.
Anything else?
Finally, let’s look at the body of quad. It has yet another new construct: a function application. Be very careful to distinguish between a function definition, which describes what the function is, and an application, which uses it. The argument (or actual parameter) in the inner application of double is x; the argument in the outer application is double(x). Thus, the argument can be any complex expression.
data ExprC: |
| numC(n :: Number) |
| plusC(l :: ExprC, r :: ExprC) |
| multC(l :: ExprC, r :: ExprC) |
| trueC |
| falseC |
| ifC(c :: ExprC, t :: ExprC, e :: ExprC) |
| <idC-dt> |
end |
| idC(s :: String) |
| appC(f :: String, a :: ExprC) |
fdC("double", "x", plusC(idC("x"), idC("x")))
fdC("quad", "x", appC("double", appC("double", idC("x"))))
fdC("const5", "_", numC(5))
Look out! Did you notice that we spoke of a set of function definitions, but chose a list representation? That means we’re using an ordered collection of data to represent an unordered entity. At the very least, then, when testing, we should use any and all permutations of definitions to ensure we haven’t subtly built in a dependence on the order.
Extend desugar with support for identifiers and applications.
18.1.2 Growing the Interpreter
fun interp(e :: ExprC, fds :: List<FunDefC>) -> Value: |
cases (ExprC) e: |
end |
end |
| numC(n) => numV(n) |
| plusC(l, r) => arith-binop(lam(x, y): x + y end, l, r, fds) |
| multC(l, r) => arith-binop(lam(x, y): x * y end, l, r, fds) |
| trueC => boolV(true) |
| falseC => boolV(false) |
| ifC(cnd, thn, els) => |
ic = interp(cnd, fds) |
if is-boolV(ic): |
if ic.b: |
interp(thn, fds) |
else: |
interp(els, fds) |
end |
else: |
raise('not a boolean') |
end |
Modify arith-binop to pass along fds unchanged in recursive calls.
fun get-fundef(name :: String, fds :: List<FunDefC>) |
-> FunDefC: |
end |
18.1.3 Substitution
fun subst(w :: ExprC, at :: String, in :: ExprC) -> ExprC: |
end |
Suppose we want to substitute 3 for the identifier x in the bodies of the three example functions above. What should it produce?
A common mistake is to assume that the result of substituting, e.g., 3 for x in double is fun double(x): 3 + 3 end. This is incorrect. We only substitute at the point when we apply the function, at which point the function’s invocation is replaced by its body. The header enables us to find the function and ascertain the name of its parameter; but only its body participates in evaluation. Examine the use of substitution in the interpreter to see how returning a function definition would result in a type error.
These examples already tell us what to do in almost all the cases. Given a number, there’s nothing to substitute. If it’s an identifier, we have to to replace the identifier if it’s the one we’re trying to substitute, otherwise leave it alone. In the other cases, descend into the sub-expressions, performing substitution.
Before we turn this into code, there’s an important case to consider. Suppose the name we are substituting happens to be the name of a function. Then what should happen?
What, indeed, should happen?
There are many ways to approach this question. One is from a design perspective: function names live in their own “world”, distinct from ordinary program identifiers. Some languages (such as C and Common Lisp, in slightly different ways) take this perspective, and partition identifiers into different namespaces depending on how they are used. In other languages, there is no such distinction; indeed, we will examine such languages soon (Functions Anywhere).
For now, we will take a pragmatic viewpoint. If we evaluate a function name, it would result in a number or Boolean. However, these cannot name functions. Therefore, it makes no sense to substitute in that position, and we should leave the function name unmolested irrespective of its relationship to the variable being substituted. (Thus, a function could have a parameter named x as well as refer to another function called x, and these would be kept distinct.)
cases (ExprC) in: |
| numC(n) => in |
| plusC(l, r) => plusC(subst(w, at, l), subst(w, at, r)) |
| multC(l, r) => multC(subst(w, at, l), subst(w, at, r)) |
| trueC => trueC |
| falseC => falseC |
| ifC(cnd, thn, els) => |
ifC(subst(w, at, cnd), subst(w, at, thn), subst(w, at, els)) |
| appC(f, a) => appC(f, subst(w, at, a)) |
| idC(s) => |
if s == at: |
w |
else: |
in |
end |
end |
Observe that, whereas in the numC case the interpreter returned numV(n), substitution returns in (i.e., the original expression, equivalent at that point to writing numC(n)). Why?
18.1.4 The Interpreter, Resumed
| appC(f, a) => |
fd = get-fundef(f, fds) |
subst(a, fd.arg, fd.body) |
Tempting, but wrong.
Do you see why?
| appC(f, a) => |
fd = get-fundef(f, fds) |
interp(subst(a, fd.arg, fd.body), fds) |
Okay, that leaves only one case: identifiers. What could possibly be complicated about them? They should be just about as simple as numbers! And yet we’ve put them off to the very end, suggesting something subtle or complex is afoot.
Work through some examples to understand what the interpreter should do in the identifier case.
fun double(x): x + y end
When we substitute 5 for x, this produces the expression 5 + y. So far so good, but what is left to substitute y? As a matter of fact, it should be clear from the very outset that this definition of double is erroneous. The identifier y is said to be free, an adjective that in this setting has negative connotations.
| idC(s) => raise("unbound identifier") |
And that’s it!
cases (List<FunDefC>) fds: |
| empty => raise("couldn't find function") |
| link(f, r) => |
if f.name == name: |
f |
else: |
get-fundef(name, r) |
end |
end |
18.1.5 Oh Wait, There’s More!
fun subst(w :: ExprC, at :: String, in :: ExprC) -> ExprC: ... end
Sticking to surface syntax for brevity, suppose we apply double
to 1 + 2. This would substitute 1 + 2 for each
x, resulting in the following
expression—
fun subst(w :: Value, at :: String, in :: ExprC) -> ExprC: ... end
In fact, we don’t even have substitution quite right! The version of substitution we have doesn’t scale past this language due to a subtle problem known as “name capture”. Fixing substitution is complex, subtle, and an exciting intellectual endeavor, but it’s not the direction I want to go in here. We’ll instead sidestep this problem in this book. If you’re interested, however, read about the lambda calculus [CITE], which provides the tools for defining substitution correctly.
Modify your interpreter to substitute names with answers, not expressions.
We’ve actually stumbled on a profound distinction in programming
languages. The act of evaluating arguments before substituting them
in functions is called eager application, while that of
deferring evaluation is called lazy—
18.2 From Substitution to Environments
Though we have a working definition of functions, you may feel a slight unease about it. When the interpreter sees an identifier, you might have had a sense that it needs to “look it up”. Not only did it not look up anything, we defined its behavior to be an error! While absolutely correct, this is also a little surprising. More importantly, we write interpreters to understand and explain languages, and this implementation might strike you as not doing that, because it doesn’t match our intuition.
There’s another difficulty with using substitution, which is the
number of times we traverse the source program. It would be nice to
have to traverse only those parts of the program that are actually
evaluated, and then, only when necessary. But substitution traverses
everything—
Does substitution have implications for the time complexity of evaluation?
There’s yet another problem with substitution, which is that it is
defined in terms of representations of the program source. Obviously,
our interpreter has and needs access to the source, to interpret it.
However, other implementations—
18.2.1 Introducing the Environment
The intuition that addresses the first concern is to have the
interpreter “look up” an identifier in some sort of directory. The
intuition that addresses the second concern is to defer the
substitution. Fortunately, these converge nicely in a way that also
addresses the third. The directory records the intent to
substitute, without actually rewriting the program source; by
recording the intent, rather than substituting immediately, we can
defer substitution; and the resulting data structure, which is called
an environment, avoids the need for source-to-source rewriting
and maps nicely to low-level machine representations. Each name
association in the environment is called a
binding.This does not mean our study of
substitution was useless; to the contrary, many tools that work over
programs—
One subtlety is in defining precisely what “the same” means, especially with regards to failure.
Let’s first define our environment data structure. An environment is a collection of names associated with...what?
A natural question to ask here might be what the environment maps names to. But a better, more fundamental, question is: How to determine the answer to the “natural” question?
Remember that our environment was created to defer substitutions. Therefore, the answer lies in substitution. We discussed earlier (Oh Wait, There’s More!) that we want substitution to map names to answers, corresponding to an eager function application strategy. Therefore, the environment should map names to answers.
data Binding: | bind(name :: String, value :: Value) end type Environment = List<Binding> mt-env = empty xtnd-env = link
18.2.2 Interpreting with Environments
Now we can tackle the interpreter. One case is easy, but we should revisit all the others:
fun interp(e :: ExprC, nv :: Environment, fds :: List<FunDefC>) -> Value: |
cases (ExprC) e: |
end |
end |
| numC(n) => numV(n) |
| plusC(l, r) => arith-binop(lam(x, y): x + y end, l, r, nv, fds) |
| multC(l, r) => arith-binop(lam(x, y): x * y end, l, r, nv, fds) |
| trueC => boolV(true) |
| falseC => boolV(false) |
| ifC(cnd, thn, els) => |
ic = interp(cnd, nv, fds) |
if is-boolV(ic): |
if ic.b: |
interp(thn, nv, fds) |
else: |
interp(els, nv, fds) |
end |
else: |
raise('not a boolean') |
end |
| idC(s) => lookup(s, nv) |
Implement lookup.
| appC(f, a) => |
fd = get-fundef(f, fds) |
arg-val = interp(a, nv, fds) |
interp(fd.body, <fof-env-interp-appC-rest-xtnd>, fds) |
xtnd-env(bind(fd.arg, arg-val), nv) |
fun lookup(s :: String, nv :: Environment) -> Value: |
cases (List) nv: |
| empty => raise("unbound identifier: " + s) |
| link(f, r) => |
if s == f.name: |
f.value |
else: |
lookup(s, r) |
end |
end |
end |
Observe that looking up a free identifier still produces an error, but
it has moved from the interpreter—
check: |
f1 = fdC("double", "x", plusC(idC("x"), idC("x"))) |
f2 = fdC("quad", "x", appC("double", appC("double", idC("x")))) |
f3 = fdC("const5", "_", numC(5)) |
f4 = fdC("f4", "x", s2p2d("(if x 1 0)")) |
funs = [list: f1, f2, f3, f4] |
fun i(e): interp(e, mt-env, funs) end |
|
i(plusC(numC(5), appC("quad", numC(3)))) is numV(17) |
i(multC(appC("const5", numC(3)), numC(4))) is numV(20) |
i(plusC(numC(10), appC("const5", numC(10)))) is numV(10 + 5) |
i(plusC(numC(10), appC("double", plusC(numC(1), numC(2))))) |
is numV(10 + 3 + 3) |
i(plusC(numC(10), appC("quad", plusC(numC(1), numC(2))))) |
is numV(10 + 3 + 3 + 3 + 3) |
Spot the bug.
18.2.3 Deferring Correctly
interp(appC("f1", numC(3)), mt-env, |
[list: fdC("f1", "x", appC("f2", numC(4))), |
fdC("f2", "y", plusC(idC("x"), idC("y")))]) |
raises "unbound identifier: x" |
fun f1(x): f2(4) end fun f2(y): x + y end f1(3)
In fact, so will our substitution-based interpreter!
Why does the substitution process result in an error? It’s because, when we replace the representation of x with the representation of 3 in the representation of f1, we do so in f1 only.This “the representation of” is getting a little annoying, isn’t it? Therefore, I’ll stop saying that, but do make sure you understand why I had to say it. It’s an important bit of pedantry. (Obviously: x is f1’s parameter; even if another function had a parameter named x, that’s a different x.) Thus, when we get to evaluating the body of f2, its x hasn’t been substituted, resulting in the error.
What went wrong when we switched to environments? Watch carefully: this is subtle. We can focus on applications, because only they affect the environment. When we substituted the formal for the value of the actual, we did so by extending the current environment. In terms of our example, we asked the interpreter to substitute not only f2’s substitution in f2’s body, but also the current ones (those for the caller, f1), and indeed all past ones as well. That is, the environment only grows; it never shrinks.
xtnd-env(bind(fd.arg, arg-val), mt-env) |
18.2.4 Scope
The broken environment interpreter above implements what is known as dynamic scope. This means the environment accumulates bindings as the program executes. As a result, whether an identifier is even bound depends on the history of program execution. We should regard this unambiguously as a flaw of programming language design. It adversely affects all tools that read and process programs: compilers, IDEs, and humans.
In contrast, substitution—
18.2.5 How Bad Is It?
To understand the binding structure of your program, you may need to look at the whole program. No matter how much you’ve decomposed your program into small, understandable fragments, it doesn’t matter if you have a free identifier anywhere.
Understanding the binding structure is not only a function of the size of the program but also of the complexity of its control flow. Imagine an interactive program with numerous callbacks; you’d have to track through every one of them, too, to know which binding governs an identifier.
if moon-visible(): f1(10) else: f2(10) end
What happens on cloudy nights?
18.2.6 The Top-Level Scope
(define y 1) (define (f x) (+ x y))
(define y 1) (define (f x) (+ x y)) (define y 2)
(define y 1) (define f (let ((z y)) (lambda (x) (+ x y z)))) (define y 2)
18.2.7 Exposing the Environment
If we were building the implementation for others to use, it would be wise and a courtesy for the exported interpreter to take only an expression and list of function definitions, and invoke our defined interp with the empty environment. This both spares users an implementation detail, and avoids the use of an interpreter with an incorrect environment. In some contexts, however, it can be useful to expose the environment parameter. For instance, the environment can represent a set of pre-defined bindings: e.g., if the language wishes to provide pi automatically bound to 3.2 (in Indiana).
18.3 Functions Anywhere
Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. [REF]
One of the things we stayed coy about when introducing functions (Adding Functions to the Language) is exactly where functions go. We suggested we’re following the model of an idealized programming environment, with definitions and their uses kept separate. But, inspired by the Scheme design principle, let’s examine how necessary that is.
Why can’t functions definitions be expressions? In our current arithmetic-centric language we face the uncomfortable question “What value does a function definition represent?”, to which we don’t really have a good answer. But a real programming language obviously computes more than numbers and Booleans, so we no longer need to confront the question in this form; indeed, the answer to the above can just as well be, “A function value”. Let’s see how that might work out.
(+ 2 ([deffun f x (* x 3)] 4))
18.3.1 Functions as Expressions and Values
data ExprC: |
| numC(n :: Number) |
| plusC(l :: ExprC, r :: ExprC) |
| multC(l :: ExprC, r :: ExprC) |
| trueC |
| falseC |
| ifC(c :: ExprC, t :: ExprC, e :: ExprC) |
| idC(s :: String) |
end |
| fdC(name :: String, arg :: String, body :: ExprC) |
| appC(f :: ExprC%(is-fdC), a :: ExprC) |
fun interp(e :: ExprC, nv :: Environment): |
# removed return annotation of Value because fdC is not a Value! |
cases (ExprC) e: |
| numC(n) => numV(n) |
| plusC(l, r) => arith-binop(lam(x, y): x + y end, l, r, nv) |
| multC(l, r) => arith-binop(lam(x, y): x * y end, l, r, nv) |
| trueC => boolV(true) |
| falseC => boolV(false) |
| ifC(cnd, thn, els) => |
ic = interp(cnd, nv) |
if is-boolV(ic): |
if ic.b: |
interp(thn, nv) |
else: |
interp(els, nv) |
end |
else: |
raise('not a boolean') |
end |
| idC(s) => lookup(s, nv) |
Observe that we’ve left out the return annotation on interp. Why do you think this is? Run some examples to figure it out.
| fdC(_, _, _) => e |
| appC(f, a) => |
fun-val = interp(f, nv) |
arg-val = interp(a, nv) |
interp(fun-val.body, xtnd-env(bind(fun-val.arg, arg-val), mt-env)) |
check: f1 = fdC("double", "x", plusC(idC("x"), idC("x"))) f2 = fdC("quad", "x", appC(f1, appC(f1, idC("x")))) f3 = fdC("const5", "_", numC(5)) f4 = fdC("f4", "x", s2p2d("(if x 1 0)")) fun i(e): interp(e, mt-env) end i(plusC(numC(5), appC(f2, numC(3)))) is numV(17) i(multC(appC(f3, numC(3)), numC(4))) is numV(20) i(plusC(numC(10), appC(f3, numC(10)))) is numV(10 + 5) i(plusC(numC(10), appC(f1, plusC(numC(1), numC(2))))) is numV(10 + 3 + 3) i(plusC(numC(10), appC(f2, plusC(numC(1), numC(2))))) is numV(10 + 3 + 3 + 3 + 3) end
18.3.2 A Small Improvement
Is there any part of our interpreter definition that we never use?
| fdC(arg :: String, body :: ExprC) |
Do you see what else you need to change?
| fdC(_, _) => e |
18.3.3 Nesting Functions
inner-fun = fdC("x", plusC(idC("x"), idC("x"))) outer-fun = fdC("x", inner-fun)
fdC("x", fdC("x", plusC(idC("x"), idC("x"))))
fdC("x", plusC(idC("x"), idC("x")))
...
appC(fdC("x", fdC("y", plusC(idC("x"), idC("y")))), numC(4))
fdC("y", plusC(idC("x"), idC("y")))
18.3.4 Nested Functions and Substitution
appC(fdC("x", fdC("x", plusC(idC("x"), idC("x")))), numC(4))
fdC("x", plusC(idC("x"), idC("x")))
appC(fdC("x", fdC("y", plusC(idC("x"), idC("y")))), numC(4))
fdC("y", plusC(numC(4), idC("y")))
In other words, we’re again failing to faithfully capture what
substitution would have done. A function value needs to
remember the substitutions that have already been applied to
it. Because we’re representing substitutions using an environment, a
function value therefore needs to be bundled with an
environment. This resulting data structure is called a
closure.“Save the environment! Create a closure
today!”—
18.3.5 Updating Values
data Value: |
| numV(n :: Number) |
| boolV(b :: Boolean) |
| closV(f :: ExprC%(is-fdC), e :: Environment) |
end |
fun interp(e :: ExprC, nv :: Environment) -> Value: |
cases (ExprC) e: |
end |
end |
Write out these two cases.
| fdC(_, _) => closV(e, nv) |
| appC(f, a) => |
clos = interp(f, nv) |
arg-val = interp(a, nv) |
interp(clos.f.body, xtnd-env(bind(clos.f.arg, arg-val), clos.e)) |
Observe that the argument to interp is clos.e rather than mt-env. Write a program that illustrates the difference.
This now computes the same answer we would have gotten through substitution.
If we now switch back to using substitution, will we encounter any problems?
Yes, we will. We’ve defined substitution to replace program text in other program text. Strictly speaking we can no longer do this, because Value terms cannot be contained inside ExprC ones. That is, substitution is predicated on the assumption that the type of answers is a form of syntax. It is actually possible to carry through a study of programming under this assumption, but we won’t take that path here.
18.3.6 Sugaring Over Anonymity
Now let’s get back to the idea of naming functions, which has evident value for program understanding. Observe that we do have a way of naming things: by passing them to functions, where they acquire a local name (that of the formal parameter). Anywhere within that function’s body, we can refer to that entity using the formal parameter name.
fun double(x): x + x end double(10)
double = lam(x): x + x end double(10)
(let ([double (lambda (x) (+ x x))]) (double 10))
fun something(): double = lam(x): x + x end double(10) end
(define (double x) (+ x x)) (define (quad x) (double (double x))) (quad 10)
(let ([double (lambda (x) (+ x x))]) (let ([quad (lambda (x) (double (double x)))]) (quad 10)))
(let ([quad (lambda (x) (double (double x)))]) (let ([double (lambda (x) (+ x x))]) (quad 10)))
18.4 Recursion and Non-Termination
Hopefully you can convince yourself that our pure expression
languages—
Construct a non-terminating program for that interpreter.
il = fdC("inf-loop", "x", appC("inf-loop", numC(0)))
interp(appC("inf-loop", numC(0)), [list: il])
Precisely identify the generative recursion that enables this.
Why does this work? Why is this an infinite loop?
What’s happening here is actually somewhat subtle. The initial call to interp results in the interpreter finding a function and interpreting its body, which results in another call to interp: which finds the function and interprets its body, which results...and so on. If for some reason Pyret did not support recursion (which, historically, some languages did not!), then this would not work. Indeed, there is still something we are leaving to Pyret:
Does this program truly run for “ever” (meaning, as long as the computer is functioning properly), or does it run out of stack space?
Okay, that was easy. Now let’s consider our most recent interpreter. What can it do?
fun loop-forever(): loop-forever() end loop-forever()
loop-forever = lam(): loop-forever() end loop-forever()
(lam(loop-forever): loop-forever() end)(lam(): loop-forever() end)
Therefore, Pyret’s = is clearly doing something more than just textual substitution: it is also “tying the loop” for recursive definitions.
Can we try anything else that might succeed?
little-omega = lam(x): x(x) end
omega = little-omega(little-omega)
Why does this run forever? Consider using substitution to explain why.
(lam(x): x(x) end)(lam(x): x(x) end)
18.5 Functions and Predictability
We began (Adding Functions to the Language) with a language where at all application points, we knew exactly which function was going to be invoked (because we knew its name, and the name referred to one of a fixed global set). These are known as first-order functions. In contrast, we later moved to a language (Functions Anywhere) with first-class functions: those that had the same status as any other value in the language.
This transition gave us a great deal of new flexiblity. For instance, we saw (Sugaring Over Anonymity) that some seemingly necessary language features could instead be implemented just as syntactic sugar; indeed, with true first-class functions, we can define all of computation ([EMPTY]). So what’s not to like?
The subtle problem is that whenever we increase our expressive power, we correspondingly weaken our predictive power. In particular, when confronted with a particular function application in a program, the question is, can we tell precisely which function is going to be invoked at this point? With first-order functions, yes; with higher-order functions, this is undecidable. Having this predictive power has many important consequences: a compiler can choose to inline (almost) every function application; a programming environment can give substantial help about which function is being called at that point; a security analyzer can definitively rule out known bad functions, thereby reducing the number of useless alerts it generates. Of course, with higher-order functions, all these operations are still sometimes possible; but they are not always possible, and how possible they are depends on the structure of the program and the cleverness of tools.
With higher-order functions, why is determining the precise function at an application undecidable?
Why does the above reference to inlining say “almost”?
19 Reasoning about Programs: A First Look at Types
One of the themes of this book is predictability
(Predictability as a Theme). One of our key tools in reasoning about
program behavior before we run it is the static checking of
types. For example, when we write
x :: Number, we mean that x will always hold a
Number, and that all parts of the program that depend on
x can rely on this statement being enforced. As we will see,
types are just one point in a spectrum of invariants we might wish to
state, and static type checking—
19.1 Types as a Static Discipline
In this chapter, we will focus especially on static type checking: that is, checking (declared) types before the program even executes.This is an extremely rich and active subject. For further study, I strongly recommend reading Pierce’s Types and Programming Languages. We will explore some of the design space of types and their trade-offs. Finally, though static typing is an especially powerful and important form of invariant enforcement, we will also examine some other techniques that we have available [REF].
fun f(n :: Number) -> Number: n + 3 end f("x")
fun f(n): n + 3 end f("x")
How would you test the assertions that one fails before the program executes while the other fails during execution?
fun f n: n + 3 end
We will begin by introducing a traditional core language of types. Later, we will explore both extensions [REF] and variations [REF].
19.2 The Principle of Substitutability
The essence of any typing mechanism is usually the principle of substitutability: two types A and B “match” when values of one can be used in place of values of the other. Therefore, the design of a type system implicitly forces us to consider when such substitutions are safe (in the sense given by The Central Theorem: Type Soundness).
Of course, the simplest notion of substitutability is simply identity: a type can only be substituted with itself, and nothing else. For instance, if the declared type of a function’s parameter is String, then you can only call it with String-typed values, nothing else. This is known as invariance: the set of values that can be passed into a type cannot “vary” from the set expected by that type. This is so obvious that it might seem to hardly warrant a name! However, it is useful to name because it sets up a contrast with later type systems when we will have richer, non-trivial notions of substitutability (see Subtyping).
19.3 A Type(d) Language and Type Errors
Before we can define a type checker, we have to fix two things: the syntax of our typed core language and, hand-in-hand with that, the syntax of types themselves.
We’ll begin with our language with functions-as-values (Functions Anywhere). To this language we have to add type annotations. Conventionally, we don’t impose type annotations on constants or on primitive operations such as addition, because this would be unbearably tedious; instead, we impose them on the boundaries of functions or methods. Over the course of this study we will explore why this is a good locus for annotations.
data TyExprC: | numC(n :: Number) | plusC(l :: TyExprC, r :: TyExprC) | multC(l :: TyExprC, r :: TyExprC) | trueC | falseC | ifC(c :: TyExprC, t :: TyExprC, e :: TyExprC) | idC(s :: String) | appC(f :: TyExprC, a :: TyExprC) | fdC(arg :: String, at :: Type, rt :: Type, body :: TyExprC) end
Now we have to decide on a language of types. To do so, we follow the tradition that the types abstract over the set of values. In our language, we have three kinds of values. It follows that we should have three kinds of types: one each for numbers, Booleans, and functions.
What information does a number type need to record? In most
languages, there are actually many numeric types, and indeed
there may not even be a single one that represents “numbers”.
However, we have ignored these gradations between numbers
((part "change-rep")), so it’s sufficient for us to have just one.
Having decided that, do we record additional information about
which number? We could in principle, but that would mean for
types to check, we would have to be able to decide whether two
expressions compute the same number—
We treat Booleans just like numbers: we ignore which Boolean it is. Here, we perhaps have more value in being precise, because there are only two values we need to track, not an infinite number. That means in some cases, we even know which branch of a conditional we will take, and can examine only that branch (though that may miss a type-error lurking in the other branch: what should we do about that?). However, even the problem of knowing precisely which Boolean we have reduces to the Halting Problem [REF].
Construct an argument for why determining which number or Boolean an arbitrary expression evaluates to is equivalent to solving the Halting Problem.
data Type: | numT | boolT | funT(a :: Type, r :: Type) end
One or both arguments of + is not a number, i.e., does not have type numT.
One or both arguments of * is not a number.
The expression in the function position of an application is not a function, i.e., does not have type funT.
Any more?
The expression in the function position of an application is a function but the type of the actual argument does not match the type of the formal argument expected by the function.
Any more?
The expression in the function position of an application is a function but its return type does not match the type expected by the expression that invokes the function?
| numC(n :: Number) | plusC(l :: TyExprC, r :: TyExprC) | multC(l :: TyExprC, r :: TyExprC)
| trueC | falseC | ifC(c :: TyExprC, t :: TyExprC, e :: TyExprC)
The conditional expression must have type Boolean.
Both branches must have the same type (whatever it may be).Implicit is the idea that we can easily determine when two types are the “same”. We’ll return to this in Subtyping.
| idC(s :: String) | appC(f :: TyExprC, a :: TyExprC) | fdC(arg :: String, at :: Type, rt :: Type, body :: TyExprC)
The function position (f) must have a function type (funT).
The type of the actual argument expression (a) must match the argument type (.arg) of the function position.
The type of the body—
assuming the formal argument (arg) has been given a value of the declared type (at)— matches the type declared (rt) as the return type.
19.3.1 Assume-Guarantee Reasoning
The last few cases we just saw had a very interesting structure. Did you spot it?
fun f(x :: String) -> Number: if x == "pi": 3.14 else: 2.78 end end 2 + f("pi")
Similarly, when type-checking the application, having looked up the
type of f, we assume that it will indeed return a value
of type Number. We can assume this because that is the
return type annotation of f. We do assume it because
the type-checker will ensure that the body of f—
In short, the treatment of function definition and application are complementary. They are joined together by a method called assume-guarantee reasoning, whereby each side’s assumptions are guaranteed by the other side, and the two stitch together perfectly to give us the desired safe execution (which we elaborate on later: The Central Theorem: Type Soundness).
19.4 A Type Checker for Expressions and Functions
19.4.1 A Pure Checker
tc :: TyExprC -> Boolean
Define the types and functions associated with type environments.
fun tc(e :: TyExprC, tnv :: TyEnv) -> Boolean: |
cases (TyExprC) e: |
end |
end |
| numC(_) => true |
| idC(s) => ty-lookup(s, tnv) |
This should make you a little uncomfortable: we seem to be throwing away valuable information about the type of the identifier. Of course, types do throw away information (e.g., which specific number an expression computes). However, the kind of information we’re throwing away here is much more significant: it’s not about a specific value within a type, but the type itself. Nevertheless, let’s push on.It might also bother you that, by only returning a Boolean, we have no means to express what type error occurred. But you might assuage yourself by saying that’s only because we have too weak a return type.
| appC(f, a) => |
f-t = tc(f, tnv) |
a-t = tc(a, tnv) |
... |
| appC(f, a) => f-t = tc(f, tnv) a-t = tc(a, tnv) if is-funT(f-t): if a-t == f-t.arg:
In other words, what we need is something that will calculate the type of an expression, no matter how complex it is. Of course, such a procedure could only succeed if the expression is well-typed; otherwise it would not be able to provide a coherent answer. In other words, a type “calculator” has type “checking” as a special case!
That was subtle. Read it again.
We should therefore strengthen the inductive invariant on tc: that it not only tells us whether an expression is typed, but also what its type is. Indeed, by giving any type at all it confirms that the expression types, and otherwise it signals an error.
19.4.2 A Calculator and Checker
fun tc(e :: TyExprC, tnv :: TyEnv) -> Type: |
cases (TyExprC) e: |
end |
end |
| numC(_) => numT |
| idC(s) => ty-lookup(s, tnv) |
| plusC(l, r) => tc-arith-binop(l, r, tnv) |
fun tc-arith-binop(l :: TyExprC, r :: TyExprC, tnv :: TyEnv) -> Type: if (tc(l, tnv) == numT) and (tc(r, tnv) == numT): numT else: raise('type error in arithmetic') end end
| multC(l, r) => tc-arith-binop(l, r, tnv) |
Did you see what’s different?
That’s right: nothing! That’s because, from the perspective of type-checking (in this type language), there is no difference between addition and multiplication, or indeed between any two operations that consume two numbers and return one. Because we are ignoring the actual numbers, we don’t even need to bother passing tc-arith-binop a function that reflects what to do with the pair of numbers.
Observe another difference between interpreting and type-checking. Both care that the arguments be numbers. The interpreter then returns a precise sum or product, but the type-checker is indifferent to the differences between them: therefore the expression that computes what it returns (numT) is a constant, and the same constant in both cases.
| trueC => boolT |
| falseC => boolT |
| ifC(cnd, thn, els) => |
cnd-t = tc(cnd, tnv) |
if cnd-t == boolT: |
thn-t = tc(thn, tnv) |
els-t = tc(els, tnv) |
if thn-t == els-t: |
thn-t |
else: |
raise("conditional branches don't match") |
end |
else: |
raise("conditional isn't Boolean") |
end |
Consider each of the three earlier decisions. Change each one, and explain the consequences it has for the type-checker.
| appC(f, a) => |
f-t = tc(f, tnv) |
a-t = tc(a, tnv) |
if is-funT(f-t): |
if a-t == f-t.arg: |
f-t.ret |
else: |
raise("argument type doesn't match declared type") |
end |
else: |
raise("not a function in application position") |
end |
| fdC(a, at, rt, b) => |
bt = tc(b, xtend-t-env(tbind(a, at), tnv)) |
if bt == rt: |
funT(at, rt) |
else: |
raise("body type doesn't match declared type") |
end |
19.4.3 Type-Checking Versus Interpretation
When confronted with a first-class function, our interpreter created a closure. However, we don’t seem to have any notion of a “closure” in our type-checker, even though we’re using an (type) environment. Why not? In particular, recall that the absence of closures resulted in violation of static scope. Is that happening here? Write some tests to investigate.
Observe a curious difference between the interpreter and type-checker. In the interpreter, application was responsible for evaluating the argument expression, extending the environment, and evaluating the body. Here, the application case does check the argument expression, but leaves the environment alone, and simply returns the type of the body without traversing it. Instead, the body is actually traversed by the checker when checking a function definition, so this is the point at which the environment actually extends.
Why is the time of traversal different between interpretation and type-checking?
p = lam(x :: Number) -> (Number -> Number): lam(y :: Number) -> Number: x + y end end
When we simply define p, the interpreter does not traverse the interior of these expressions, in particular the x + y. Instead, these are suspended waiting for later use (a feature we actually exploit ((part "laziness"))). Furthermore, when we apply p to some argument, this evaluates the outer function, resulting in a closure (that closes over the binding of x).Now instead consider the type-checker. As soon as we are given this definition, it traverses the entire expression, including the innermost sub-expression. Because it knows everything it needs to know about x and y—
their types— it can immediately type-check the entire expression. This is why it doesn’t not require to create a closure: there is nothing to be put off until application time (indeed, we don’t want to put type-checking off until execution). Another way to think about it is that it behaves like substitution does—
and substitution did not need closures to provide static scoping, either— but even more eagerly: it can perform substitution with just the program text without any values at all, because it is substituting types, which are already given. The fact that we use a type environment makes this harder to see, because we may have come to associate environments with closures. However, what matters is when the necessary value is available. Put differently, we used an environment primarily out of convention: here, we could have used (type) substitution just as well. ExerciseWrite examples to study this. Consider converting the above example as a starting point. Also convert your examples from earlier.
- Consider the following expression:
lam(f :: (Number -> String), n :: Number) -> String: f(n) end
When evaluating the inner f(n), the interpreter has access to actual values for f and n. In contrast, when type-checking it, it does not know which function will be passed in as f. How, then, can it type-check the use?The answer is that the annotation tells the type-checker everything it needs to know. The annotation says that f must accept numbers; since n is annotated to be a number, the application works. It also says that f will return strings; because that is what the overall function returns, this also passes.
In other words, the annotation (Number -> String) represents not one but an infinite family of all functions of that type, without committing to any one of them. The type checker then checks that any such function will work in this setting. Once it has done its job, it doesn’t matter which function we actually pass in, provided it has this type. Checking that is, of course, the heart of Assume-Guarantee Reasoning.
19.5 Type-Checking, Testing, and Coverage
Instead of using concrete values, it uses only types. Therefore, it cannot check fine gradations inside values.
In return, it works statically: that is, it’s like running a lightweight testing procedure before ever running the program. (We should not underestimate the value of this: programs that depend on interactive or other external input, on specialized hardware, on timing, and so on, can be quite difficult to test. For such programs, especially, obtaining a lightweight form of testing that does not require being able to run it at all is invaluable.)
Testing only covers the parts of a program that are exercised by test cases. In contrast, the type-checker exercises the whole program. Therefore, it can catch lurking errors. Of course, it also means that the entire program has to be type-conformant: you can’t have some parts (e.g., conditional branches) that are not yet conformant, the way they can fail to work correctly but can be ignored by tests that don’t exercise them.
Finally, types provide another very important property: quantification. Recall our earlier example: the type checker has established something about an infinite number of functions!
19.6 Recursion in Code
Now that we’ve obtained a basic programming language, let’s add recursion to it. We saw earlier (Recursion and Non-Termination) that this could be done quite easily. It’ll prove to be a more complex story here.
19.6.1 A First Attempt at Typing Recursion
Let’s now try to express a simple recursive function. We’ve already seen how to write infinite loops for first-order functions. Annotating them introduces no complications.
Confirm that adding types to recursive and non-terminating first-order functions causes no additional problems.
(fun(x): x(x) end)(fun(x): x(x) end)
Recall that this program is formed by applying ω to itself. Of
course, it is not a given that identical terms must have precisely the
same type, because it depends on the context of use. However, the
specific structure of ω means that it is the same term that
ends up in both contexts—
Therefore, let’s try to type ω; let’s call this type T. It’s clearly a function type, and the function takes one argument, so it must be of the form A -> B. Now what is that argument? It’s ω itself. That is, the type of the value going into A is itself T. Thus, the type of ω is T, which is A -> B, which is the same as T -> B. This expands into (A -> B) -> B, which is the same as (T -> B) -> B. Therefore, this further expands to ((A -> B) -> B) -> B, and so on. In other words, this type cannot be written as any finite string!
Did you notice the subtle but important leap we just made?
We have just argued that we can’t type ω. But why does it follow that we can’t type Ω?
19.6.2 Program Termination
Because type-checking follows by recurring on sub-terms, to type Ω, we have to be able to type ω and then combine its type to obtain one for Ω. But, as we’ve seen, typing ω seems to run into serious problems. From that, however, we jumped to the conclusion that ω’s type cannot be written as any finite string, for which we’ve given only an intuition, not a proof. In fact, something even stranger is true: in the type system we’ve defined so far, we cannot type Ω at all!
This is a strong statement, but it follows from something even
stronger. The typed language we have so far has a property
called strong normalization: every expression that has a type
will terminate computation after a finite number of steps. In other
words, this special (and peculiar) infinite loop program isn’t the
only one we can’t type; we can’t type any infinite loop (or
even potential infinite loop). A rough intuition that might help is
that any type—
Why is this not true when we have named first-order functions?
If our language permitted only straight-line programs, this would be unsurprising. However, we have conditionals and even functions being passed around as values, and with those we can encode almost every program we’re written so far. Yet, we still get this guarantee! That makes this a somewhat astonishing result.
Try to encode lists using functions in the untyped and then in the typed language (see [REF] if you aren’t sure how). What do you see? And what does that tell you about the impact of this type system on the encoding?
This result also says something deeper. It shows that, contrary to
what you may believe—
A complex scheduling algorithm (the guarantee would ensure that the scheduler completes and that the tasks being scheduled will actually run).
A packet-filter in a router. (Network elements that go into infinite loops put a crimp on utility.)
A compiler. (The program it generates may or may not terminate, but it ought to at least finish generating the program.)
A device initializer. (Modern electronics—
such as a smartphones and photocopiers— have complex initialization routines. These have to finish so that the device can actually be put to use.) The callbacks in JavaScript. (Because the language is single-threaded, not relinquishing control means the event loop starves. When this happens in a Web page, the browser usually intervenes after a while and asks whether to kill the page—
because otherwise the rest of the page (or even browser) becomes unresponsive.) A configuration system, such as a build system or a linker.In the Standard ML language, the language for linking modules uses essentially this typed language for writing module linking specifications. This means developers can write quite sophisticated abstractions—
they have functions-as-values, after all!— while still being guaranteed that linking will always terminate, producing a program.
Notice also an important difference between types and tests (Type-Checking, Testing, and Coverage): you can’t test for termination!
19.6.3 Typing Recursion
(rec (S num (n num) |
(if0 n |
0 |
(n + (S (n + -1))))) |
(S 10)) |
How do we type such an expression? Clearly, we must have n bound in the body of the function as we type it (but not, of course, in the use of the function, due to static scope); this much we know from typing functions. But what about S? Obviously it must be xbound in the type environment when checking the use (S 10)), and its type must be num -> num. But it must also be bound, to the same type, when checking the body of the function. (Observe, too, that the type returned by the body must match its declared return type.)
Now we can see how to break the shackles of the finiteness of the type. It is certainly true that we can write only a finite number of ->’s in types in the program source. However, this rule for typing recursion duplicates the -> in the body that refers to itself, thereby ensuring that there is an inexhaustible supply of applications.It’s our infinite quiver of arrows.
| recC(f, v, at, rt, b, c) => |
extended-env = xtend-t-env(tbind(f, funT(at, rt)), tnv) |
if not(rt == tc(b, xtend-t-env(tbind(v, at), extended-env))): |
raise("rec: function return type not correct") |
else: |
tc(c, extended-env); |
19.7 Recursion in Data
We have seen how to type recursive programs, but this doesn’t yet
enable us to create recursive data. We already have one kind of
recursive datum—
19.7.1 Recursive Datatype Definitions
Creating a new type.
Letting instances of the new type have one or more fields.
Letting some of these fields refer to instances of the same type.
Allowing non-recursive base-cases for the type.
data BinTree: | leaf | node (value :: Number, left :: BinTree, right :: BinTree) end
This style of data definition is sometimes also known as a sum of products. At the outer level, the datatype offers a set of choices (a value can be a leaf or a node). This corresponds to disjunction (“or”), which is sometimes written as a sum (the truth table is suggestive). Inside each sum is a set of fields, all of which must be present. These correspond to a conjunction (“and”), which is sometimes written as a product (ditto).
That covers the notation, but we have not explained where this new
type, BinTree, comes from. It is obviously impractical to
pretend that it is baked into our type-checker, because we can’t keep
changing it for each new recursive type definition—
19.7.2 Introduced Types
leaf :: BinTree # a constant, so no arrow node :: Number, BinTree, BinTree -> BinTree is-leaf :: BinTree -> Bool is-node :: BinTree -> Bool .value :: BinTree%(is-node) -> Number .left :: BinTree%(is-node) -> BTnum .right :: BinTree%(is-node) -> BTnum
In what two ways are the last three entries above fictitious?
Both the constructors create instances of BinTree, not something more refined. We will discuss this design tradeoff later [REF].
Both predicates consume values of type BinTree, not “any” value. This is because the type system can already tell us what type a value is. Thus, we only need to distinguish between the variants of that one type.
The selectors really only work on instances of the relevant variant—
e.g., .value can work only on instances of node, not on instances of leaf— but we don’t have a way to express this in the static type system for lack of a suitable static type. Thus, applying these can only result in a dynamic error, not a static one caught by the type system.
19.7.3 Selectors
.value, .left, and .right are selectors: they select parts of the record by name. But here are the two ways in which they are fictitious. First, syntactically: in most languages with “dotted field access”, there is no such stand-alone operator as .value: e.g., you cannot write .value(...). But even setting aside this syntactic matter (which could be addressed by arguing that writing v.value is just an obscure syntax for applying this operator) the more interesting subtlety is the semantic one.
data Payment: | cash(value :: Number) | card(number :: Number, value :: Number) end
.value :: Payment(is-cash) -> Number .value :: Payment(is-card) -> Number
.value :: Payment -> Number
A characteristic of scripting languages is that objects are merely hash tables, and all field access is turned into a hash-table reference on the string representing the field-name. Hence, o.f is just syntactic sugar for looking up the value indexed by "f" in the dictionary associated with o.
- In Racket, the structure definitions such as
(struct cash (value)) (struct card (number value)) generate distinct selectors: in this case, cash-value and card-value, respectively. Now there is no longer any potential for confusion, because they have different names that can each have distinct types.
(define (->value v) (cond [(node? v) (node-value v)] [(cash? v) (cash-value v)] [(card? v) (card-value v)]))
19.7.4 Pattern-Matching and Desugaring
cases (BinTree) t: | leaf => e1 | node(v, l, r) => e2 end
if is-leaf(t): e1 else if is-node(t): v = t.value l = t.left r = t.right e2 end
Except, that’s not quite so easy. Somehow, the desugaring that generates the code above in terms of if needs to know that the three positional selectors for a node are value, left, and right, respectively. This information is explicit in the type definition but only implicitly present in the use of the pattern-matcher (that, indeed, being the point). Somehow this information must be communicated from definition to use. Thus, the desugarer needs something akin to the type environment to accomplish its task.
Observe, furthermore, that expressions such as e1 and e2
cannot be type-checked—
20 Safety and Soundness
Now that we’ve had a first look at a type system,A type system usually has three components: a language of types, a set of type rules, and an algorithm that enforces these rules. By presenting types via a checking function we have blurred the distinction between the second and third of these, but they should still be thought of as intellectually distinct: the former provides a declarative description while the latter an executable one. The distinction becomes relevant in (part "impl-subtyp"). we’re ready to talk about the way in which types offer some notion of predictability, a notion called type soundness. Intertwined with this are terms you have probably heard like safety (or type safety) as well as others such as strong typing (and conversely weak typing) and memory safety. We should understand all of them.
20.1 Safety
Many operations in a language are partial: given some domain
over which they are defined, they accept some but not all elements of
the domain. For instance, while addition—
halt with an error,
return a special value called “not-a-number” (NaN), or
return an imaginary number (e.g., 0+1i in Scheme).
What matters, then, is whether an operation precludes any values at all or not. If it does, then we can ask whether the language prevents it from being used with any precluded values. If the language does prevent it, then we call the language safe. If it does not, we call it unsafe. Of course, a language may be safe for some operations and unsafe for others; usually we apply the term “safe” to a language as a whole if all of its operations are safe.
A safe language offers a very important guarantee to programmers: that no operation will be performed on meaningless data. Sticking with numeric addition, in an unsafe language we might be permitted to add a number to a string, which will produce some value dependent on the precise representation of strings and might change if the representation of strings changes. (For instance, the string might be zero-terminated or might record its length, which alters what the first word of the string will be.) We might even be able to add a string to a function, a result that is certainly nonsensical (what meaningful number does the first word of the machine representation of a function represent?). Therefore, though safety does not at all ensure that computations will be correct, at least it ensures they will be meaningful: no nonsensical operations will have been performed.
Observe that we have not imposed any requirement whatsoever on how a language achieves safety. Usually, safety can be achieved dynamically through run-time checks: for instance, addition would check that its arguments are numeric, division would also ensure its second argument is not zero, and so on. In addition, a static type system can also ensure safety. Because this is a common source of confusion, we should be clear: safety only means that operations are not applied to meaningless values. It does not fix any particular implementation strategy for ensuring the property.
We will return to the issue of implementations below (Types, Time, and Space). But first, we have some important foundational material to address.
20.2 “Untyped” Languages
A language with no types at all. Of course all data have some representation that gives them meaning, but it means there is only one type in the language, and all data belong to that type. Furthermore, this datatype has no variants, because that would introduce type-based discrimination. For instance, all data might be a byte, or a number, or a string, or some other single, distinctive value. Typically, no operation can fail to take a particular kind of value, because that might imply some kind of type distinction, which by definition can’t exist. Note that this is a semantic notion of untypedness.
A language with a partitioning of its run-time values—
e.g., numbers are distinct from functions— but without static annotations or checking. Note that this is a syntactic notion of untypedness.
Because the two meanings are mutually contradictory, it would be useful to have two different names for these. Some people use the terms latently typed or dynamically typed for the latter category, to tell these apart.
Following modern convention, we will use the latter term, while recognizing that some others consider the term typed to only apply to languages that have static disciplines, so the phrase “dynamically typed” is regarded as an oxymoron. Note that our preceding discussion gives us a way out of this linguistic mess. A dynamically typed language that does not check types at run-time is not very interesting (the “types” may as well not exist). In contrast, one that does check at run-time already has a perfectly good name: safe (Safety). Therefore, it makes more sense to use the name dynamically safe for a language where all safety-checks are performed at run-time, and (with a little loss of precision) statically safe for one where as many safety-checks as possible are performed statically, with only the undecidable ones relegated to run-time.
20.3 The Central Theorem: Type Soundness
We have seen earlier (Program Termination) that certain
type languages can offer very strong theorems about their programs:
for instance, that all programs in the language terminate. In
general, of course, we cannot obtain such a guarantee (indeed, we
added general recursion precisely to let ourselves write unbounded
loops). However, a meaningful type system—
What theorem might we want of a type system? Remember that the type checker runs over the static program, before execution. In doing so, it is essentially making a prediction about the program’s behavior: for instance, when it states that a particular complex term has type Number, it is predicting that when run, that term will produce a numeric value. How do we know this prediction is sound, i.e., that the type checker never lies? Every type system should be accompanied by a theorem that proves this.
The type checker sees only program text, whereas the interpreter runs over actual data.
The type environment binds identifiers to types, whereas the interpreter’s environment binds identifiers to values or locations.
The type checker compresses (even infinite) sets of values into types, whereas the interpreter treats the elements of these sets distinctly.
The type checker always terminates, whereas the interpreter might not.
The type checker passes over the body of each expression only once, whereas the interpreter might pass over each body anywhere from zero to infinite times.
The central result we wish to have for a given type-system is called soundness. It says this. Suppose we are given an expression (or program) e. We type-check it and conclude that its type is t. When we run e, let us say we obtain the value v. Then v will also have type t.
The standard way of proving this theorem is to divide it in two parts, known as progress and preservation. Progress says that if a term passes the type-checker, it will be able to make a step of evaluation (unless it is already a value); preservation says that the result of this step will have the same type as the original. If we interleave these steps (first progress, then preservation; rinse and repeat), we can conclude that the final answer will indeed have the same type as the original, so the type system is indeed sound.
For instance, consider this expression: 5 + (2 * 3). It has
the type Number. In a sound type system, progress offers a
proof that, because this term types, and is not already a value, it
can take a step of execution—
The program may not produce an answer at all; it might loop forever. In this case, the theorem strictly speaking does not apply. However, we can still observe that every intermediate representation of the program has the same type as the whole expression, so the program is computing meaningfully even if it isn’t producing a value.
Any rich enough language has properties that cannot be decided statically (and others that perhaps could be, but the language designer chose to put off until run-time to reduce the burden on the programmer to make programs pass the type-checker). When one of these properties fails—
e.g., the array index being within bounds— there is no meaningful type for the program. Thus, implicit in every type soundness theorem is some set of published, permitted exceptions or error conditions that may occur. The developer who uses a type system implicitly signs on to accepting this set.
The latter caveat looks like a cop-out. However, it is actually a
strongly positive statement, in that says any exception not in this
set will provably not be raised. Of course, in languages
designed with static types in the first place, it is not clear (except
by loose analogy) what these exceptions might be, because there would
be no need to define them. But when we retrofit a type system onto an
existing programming language—
20.4 Types, Time, and Space
data Tree: | base | node(v :: Number, l :: Tree, r :: Tree) end
fun size(t :: Tree) -> Number: cases (Tree) t: | base => 0 | node(_, l, r) => 1 + size(l) + size(r) end end
Assume instead we are in a typed language. The type-checker will have ensured that there no non-Tree value could have been substituted for a Tree-typed identifier. Therefore, there is no need for the type tag at all.Type tags would, however, still be needed by the garbage collector, though other representations such as BIBOP [REF] can greatly reduce their space impact. However, the variant tags are still needed, and will be used to dispatch between the branches. In the example, only one bit is needed to tell apart base and node values. This same bit position can be reused to tell apart variants in some other type without causing any confusion, because the type checker is responsible for keeping the types from mixing.
In other words, if there are two different datatypes that each have two variants, in the dynamically-typed world all these four variants require distinct representations. In contrast, in the typed world their representations can overlap across types, because the static type system will ensure one type’s variants are never confused for that the another. Thus, types have a genuine space (saving representation) and time (eliminating run-time checks) performance benefit for programs.
It is conventional in computer science to have a ☛ space-time tradeoff. Instead, here we have a situation where we improve both space and time. This seems almost paradoxical! How is this possible?
This dual benefit comes at some cost to the developer, who must convince the static type system that their program does not induce type errors; due to the limitations of decidability, even programs that might have run without error might run afoul of the type system. Nevertheless, for programs for which this can be done, types provide a notable saving.
20.5 Types Versus Safety
Whether or not a language is typed, i.e., has static type checks.
Whether or not a language’s run-time system is safe, i.e., performs residual checks not done by a static system—
of which there might not even be one).
| Safe |
| Unsafe | |
Typed |
| ML, Java |
| C, C++ |
Not Typed |
| Python, Racket |
| machine code |
The entry for machine code is a little questionable because the language isn’t even typed, so there’s no classification to check statically. Similarly, there is arguably nothing to check for in the run-time system, so it must best be described as “not even unsafe”. However, in practice we do end up with genuine problems, such as security vulnerabilities that arise from being able to jump and execute from arbitrary locations that hold data.
That leaves the truly insidious corner, which languages like C and C++ inhabit. Here, the static type system gives the impression that values are actually segregated by type and checked for membership. And indeed they are, in the static world. However, once a programmer passes the type-checker there are no run-time checks. To compound the problem, the language offers primitives like arbitrary pointer arithmetic, making it possible to interpret data of one kind as data of another. As a result, we should have a special place of shame for languages that actively mislead programmers.
Construct examples of C or C++ interpreting data of one kind as data of another kind.
Historically, people have sometimes used the phrase strong typing to reflect the kind of type-checking that ML and Java use, and weak typing for the other kinds. However, these phrases are at best poorly defined.
If you have ever used the phrases “strong typing” or “weak typing”, define them.
That’s what I thought. But thank you for playing.
Indeed, the phrases are not only poorly defined, they are also wrong, because the problem is not with the “strength” of the type checker but rather with the nature of the run-time system that backs them. The phrases are even more wrong because they fail to account for whether or not a theorem backs the type system.
It is therefore better to express our intent by sticking to these concepts: safety, typedness, and soundness. Indeed, we should think of this as a continuum. With rare exceptions, we want a language that is safe. Often, we want a language that is also typed. If it is typed, we would like it to be sound, so that we know that the types are not lying. In all these cases, “strong” and “weak” typing do not have any useful meaning.
21 Parametric Polymorphism
List<String>
List<String>
List<String>
Actually, none of these is quite the same. But the first and third are very alike, because the first is in Java and the third in ML, whereas the second, in C++, is different. All clear? No? Good, read on!
21.1 Parameterized Types
((A -> B), List<A> -> List<B>)
((Number -> String), List<Number> -> List<String>)
((Number -> (Number -> Number)), List<Number> -> List<(Number -> Number)>)
((String -> String), List<String> -> List<String>)
Obviously, it is impossible to load all these functions into our standard library: there’s an infinite number of these! We’d rather have a way to obtain each of these functions on demand. Our naming convention offers a hint: it is as if map takes two type parameters in addition to its two regular value ones. Given the pair of types as arguments, we can then obtain a map that is customized to that particular type. This kind of parameterization over types is called parametric polymorphism.Not to be confused with the “polymorphism” of objects, which we will discuss separately [REF].
21.2 Making Parameters Explicit
fun map(A :: ???, B :: ???, f :: (A -> B), l :: List<A>) -> List<B>: ...;
What goes in place of the ???? These are the types that are going to take the place of A and B on an actual use. But if A and B are bound to types, then what is their type?
Do we really want to call map with four arguments every time we invoke it?
Do we want to be passing types—
which are static— at the same time as dynamic values? If these are types but they are only provided at run-time invocation, how can we type-check clients, who need to know what kind of list they are getting?
Observe that once we start parameterizing, more code than we expect
ends up being parameterized. For instance, consider the type of the
humble link. Its type really is parametric over the type of
values in the list (even though it doesn’t actually depend on those
values!—
21.3 Rank-1 Polymorphism
Instead, we will limit ourselves to one particularly useful and tractable point in this space, which is the type system of Standard ML, of earlier versions of Haskell, roughly that of Java and C# with generics, and roughly that obtained using templates in C++. This language defines what is called predicative, rank-1, or prenex polymorphism.
∀ A, B : ((A -> B), List<A> -> List<B>)
In rank-1 polymorphism, the type variables can only be substituted with monotypes. (Furthermore, these can only be concrete types, because there would be nothing left to substitute any remaining type variables.) As a result, we obtain a clear separation between the type variable-parameters and regular parameters. We don’t need to provide a “type annotation” for the type variables because we know precisely what kind of thing they can be. This produces a relatively clean language that still offers considerable expressive power.Impredicative languages erase the distinction between monotypes and polytypes, so a type variable can be instantiated with another polymorphic type.
fun<T> id(x :: T) -> T: x; |
21.4 Interpreting Rank-1 Polymorphism as Desugaring
id-num = id<Number> id-str = id<String>
check: id-num(5) is 5 id-str("x") is "x" end
id-num("x") id-str(5)
However, this approach has two important limitations.
- Let’s try to define a recursive polymorphic function, such as filter. Earlier we have said that we ought to instantiate every single polymorphic value (such as even cons and empty) with types, but to keep our code concise we’ll focus just on type parameters for filter. Here’s the code:
fun<T> filter(pred :: (T -> Bool), l :: List<T>) -> List<T>: cases (List) l: | empty => empty | link(f, r) => if pred(f): link(f, filter<T>(pred, r)) else: filter<T>(pred, r); end end
Observe that at the recursive uses of filter, we must instantiate it with the appropriate type.This is a perfectly good definition. There’s just one problem. If we try to use it—e.g., filter-num = filter<Number>
the implementation will not terminate. This is because the desugarer is repeatedly trying to make new copies of the code of filter at each recursive call.ExerciseIf, in contrast, we define a local helper function that performs the recursion, this problem can be made to disappear. Can you figure out that version?
Consider two instantiations of the identity function. They would necessarily be different because they are two different pieces of code residing at different locations in memory.Indeed, the use of parametric polymorphism in C++ is notorious for creating code bloat. However, all this duplication is unnecessary! There’s absolutely nothing in the body of id, for instance, that actually depends on the type of the argument. Indeed, the entire infinite family of id functions can share just one implementation. The simple desugaring strategy fails to provide this.
In other words, the desugaring based strategy, which is essentially an implementation by substitution, has largely the same problems we saw earlier with regards to substitution as an implementation of parameter instantiation (From Substitution to Environments). However, in other cases substitution also gives us a ground truth for what we expect as the program’s behavior. The same will be true with polymorphism, as we will soon see.
Observe that one virtue to the desugaring strategy is that it does not
require our type checker to “know” about polymorphism. Rather, the
core type language can continue to be monomorphic, and all the
(rank-1) polymorphism is handled entirely through expansion. This
offers a cheap strategy for adding polymorphism to a language,
though—
Finally, though we have only focused on functions, the preceding discussion applies equally well to data structures.
21.5 Alternate Implementations
There are other implementation strategies that don’t suffer from these problems. We won’t go into them here, but the essence is to memoize ([REF]) expansion. Because we can be certain that, for a given set of type parameters, we will always get the same typed body, we never need to instantiate a polymorphic function at the same type twice. This avoids the infinite loop. If we type-check the instantiated body once, we can avoid checking at other instantiations of the same type (because the body will not have changed). Furthermore, we do not need to retain the instantiated sources: once we have checked the expanded program, we can dispose of the expanded terms and retain just one copy at run-time. This avoids all the problems discussed in the pure desugaring strategy shown above, while retaining the benefits.
Actually, we are being a little too glib. One of the benefits of
static types is that they enable us to pick more precise run-time
representations. For instance, in most languages a static type can
tell us whether we have a 32-bit or 64-bit number, or for that matter
a 32-bit value or a 1-bit value (effectively, a boolean). A compiler
can then generate specialized code for each representation, taking
advantage of how the bits are laid out (for example, 32 booleans can
use a ☛ packed representation to fit into a single
32-bit word). Thus, after type-checking at each used type, the
polymorphic instantiator may keep track of all the special types at
which a function or data structure was used, and provide this
information to the compiler for code-generation. This will then
result in several copies of the function, but only as many as those
for which the compiler can generate distinct, efficient
representations—
21.6 Relational Parametricity
There’s one last detail we must address regarding polymorphism.
We earlier said that a function like cons doesn’t depend on the specific values of its arguments. This is also true of map, filter, and so on. When map and filter want to operate on individual elements, they take as a parameter another function which in turn is responsible for making decisions about how to treat the elements; map and filter themselves simply obey their parameter functions.
One way to “test” whether this is true is to substitute some different values in the argument list, and a correspondingly different parameter function. That is, imagine we have a relation between two sets of values; we convert the list elements according to the relation, and the parameter function as well. The question is, will the output from map and filter also be predictable by the relation? If, for some input, this was not true of the output of map, then it must be that map somehow affected the value itself, not just letting the function do it. But in fact this won’t happen for map, or indeed most of the standard polymorphic functions.
Functions that obey this relational rule are called relationally parametricRead Wadler’s Theorems for Free! and Reynolds’s Types, Abstraction and Parametric Polymorphism.. This is another very powerful property that types give us, because they tell us there is a strong limit on the kinds of operations such polymorphic functions can perform: essentially, that they can drop, duplicate, and rearrange elements, but not directly inspect and make decisions on them.
At first this sounds very impressive (and it is!), but on inspection
you might realize this doesn’t square with your experience. In Java,
for instance, a polymorphic method can still use instanceof to
check which particular kind of value it obtained at run-time, and
change its behavior accordingly. Such a method would not be
relationally parametric!On the Web, you will often find
this property described as the inability of a function to inspect the
argument—
22 Type Inference
fun f(x, y): if x: y + 1 else: y - 1 end end
fun f(x :: Boolean, y :: Number): ...
Newer languages like Scala and Typed Racket have this in more limited measure: a feature called local type inference. Here, however, we will study the more traditional and powerful form.
22.1 Type Inference as Type Annotation Insertion
First, let’s understand what type inference is doing. Some people mistakenly think of languages with inference as having no type declarations, with inference taking their place. This is confused at multiple levels. For one thing, even in languages with inference, programmers are free (and for documentation purposes, are often encouraged) to annotate types. Furthermore, in the absence of such declarations, it is not quite clear what inference actually means.Sometimes, inference is also undecidable and programmers have no choice but to declare some of the types. Finally, writing explicit annotations can greatly reduce indecipherable error messages.
fun f(x :: ___, y :: ___): ...
22.2 Understanding Inference
For worked examples and more details, see Chapter 30 of Programming Languages: Application and Interpretation.
Suppose we have an expression (or program) e written in an explicitly typed language: i.e., e has type annotations everywhere they are required. Now suppose we erase all annotations in e, and use a procedure infer to deduce them back.
What property do we expect of infer?
We could demand many things of it. One might be that it produces precisely those annotations that e originally had. This is problematic for many reasons, not least that e might not even type-check, in which case how could infer possibly know what they were (and hence should be)? This might strike you as a pedantic trifle: after all, if e didn’t type-check, how can erasing its annotations and filling them back in make it do so? Since neither program type-checks, who cares?
Is this reasoning correct?
lam(x :: Number) -> String: x end
lam(x): x end
It does not say what must happen if e fails to type-check, i.e., it does not preclude a type inference algorithm that makes the faultily-typed identity function above typeable.
More importantly, it assures us that we lose nothing by employing type inference: no program that was previously typeable will now cease to be so. That means we can focus on using explicit annotations where we want to, but will not be forced to do so.Of course, this only holds if inference is decidable.
lam(x :: Number) -> Number: x end
With these preliminaries out of the way, we are now ready to delve into the mechanics of type inference. The most important thing to note is that our simple, recursive-descent type-checking algorithm (A Type Checker for Expressions and Functions) will no longer work. That was possible because we already had annotations on all function boundaries, so we could descend into function bodies carrying information about those annotations in the type environment. Sans these annotations, it is not clear how to descend. In fact, it is not clear that there is any particular direction that makes more sense than another.
All this information is in the function. But how do we extract it systematically and in an algorithm that terminates and enjoys the property we have stated above? We do this in two steps. First we generate constraints, based on program terms, on what the types must be. Then we solve constraints to identify inconsistencies and join together constraints spread across the function body. Each step is relatively simple, but the combination creates magic.
22.2.1 Constraint Generation
Our goal, ultimately, is to find a type to fill into every type annotation position. It will prove to be just as well to find a type for every expression. A moment’s thought will show that this is likely necessary anyway: for instance, how can we determine the type to put on a function without knowing the type of its body? It is also sufficient, in that if every expression has had its type calculated, this will include the ones that need annotations.
First, we must generate constraints to (later) solve. Constraint
generation walks the program source, emitting appropriate constraints
on each expression, and returns this set of constraints. It works by
recursive descent mainly for simplicity; it really computes a
set of constraints, so the order of traversal and generation
really does not matter in principle—
That it is related to the type of some identifier.
That it is related to the type of some other expression.
That it is a base type, such as numbers and Booleans.
That it is a constructed type such as a function, whose domain and range types are presumably further constrained.
data TyCon: tyeq(l :: TyCHS, r :: TyCHS) end data TyCHS: | t-expr(e :: TyExprC) | t-con(name :: String, fields :: List<TyCHS>) end
numeric-t-con = t-con("num", empty) boolean-t-con = t-con("bool", empty) fun mk-fun-t-con(a, r): t-con("fun", [list: a, r]) end
fun generate(e :: TyExprC) -> List<TyCon>: |
cases (TyExprC) e: |
end |
end |
| numC(_) => |
[list: tyeq(t-expr(e), numeric-t-con)] |
| idC(s) => |
empty |
| plusC(l, r) => generate-arith-binop(e, l, r) |
| multC(l, r) => generate-arith-binop(e, l, r) |
fun generate-arith-binop(e :: TyExprC, l :: TyExprC, r :: TyExprC) -> List<TyCon>: [list: tyeq(t-expr(e), numeric-t-con), tyeq(t-expr(l), numeric-t-con), tyeq(t-expr(r), numeric-t-con)] + generate(l) + generate(r) end
| trueC => |
[list: tyeq(t-expr(e), boolean-t-con)] |
| falseC => |
[list: tyeq(t-expr(e), boolean-t-con)] |
| ifC(cnd, thn, els) => |
[list: tyeq(t-expr(cnd), boolean-t-con), |
tyeq(t-expr(thn), t-expr(els))] + |
generate(cnd) + generate(thn) + generate(els) |
Now we get to the other two interesting cases, function declaration and application. In both cases, we must remember to generate and return constraints of the sub-expressions.
| fdC(a, b) => |
[list: tyeq(t-expr(e), mk-fun-t-con(t-expr(a), t-expr(b)))] + |
generate(b) |
| appC(f, a) => |
[list: tyeq(t-expr(f), mk-fun-t-con(t-expr(a), t-expr(e)))] + |
generate(f) + |
generate(a) |
And that’s it! We have finished generating constraints; now we just have to solve them.
22.2.2 Constraint Solving Using Unification
The process used to solve constraints is known as unification. A unifier is given a set of equations. Each equation maps a variable to a term, whose datatype is above.
For our purposes, the goal of unification is generate a substitution, or mapping from variables to terms that do not contain any variables. This should sound familiar: we have a set of simultaneous equations in which each variable is used linearly; such equations are solved using Gaussian elimination. In that context, we know that we can end up with systems that are both under- and over-constrained. The same thing can happen here, as we will soon see.
The unification algorithm works iteratively over the set of constraints. Because each constraint equation has two terms and each term can be one of two kinds, there are four cases to cover.
The algorithm begins with the set of all constraints, and the empty substitution. Each constraint is considered once and removed from the set, so in principle the termination argument should be utterly simple, but it will prove to be slightly more tricky. As constraints are disposed, the substitution set tends to grow. When all constraints have been disposed, unification returns the final substitution set.
For a given constraint, the unifier examines the left-hand-side of the equation. If it is a variable, it is now ripe for elimination. The unifier adds the variable’s right-hand-side to the substitution and, to truly eliminate it, replaces all occurrences of the variable in the substitution with the this right-hand-side.It is worth noting that because the constraints are equalities, eliminating a variable is tantamount to associating it with the same set as whatever replaces it. In other words, we can use union-find [REF] to implement this process efficiently, though if we need to backtrack during unification (as we do for logic programming [REF]), this becomes much more tricky.
Did you notice the subtle error above?
The subtle error is this. We said that the unifier eliminates the variable by replacing all instances of it in the substitution. However, that assumes that the right-hand-side does not contain any instances of the same variable. Otherwise we have a circular definition, and it becomes impossible to perform this particular substitution. For this reason, unifiers include a occurs check: a check for whether the same variable occurs on both sides and, if it does, decline to unify. For simplicity we will ignore this here.
Construct a term whose constraints would trigger the occurs check.
Do you remember ω (Recursion and Non-Termination)?
Let us now implement unification. For simplicity, we will use a list of type constraints as the representation of the subtitution.As you read this, keep in mind that unification is a generic procedure, completely independent of type-inference: indeed, the unification algorithm was invented before and spurred the creation of the type-inference process.
If we use type constraints to represent the substitution, what invariant would we expect the computed set of constraints to have?
It will be convenient to have a helper function that takes the current substitution as an accumulated parameter. Let’s therefore include it, and get the easy case out of the way:
fun unify(cs :: List<TyCon>) -> List<TyCon>: |
fun help(shadow cs :: List<TyCon>, sub :: List<TyCon>) -> List<TyCon>: |
cases (List) cs: |
| empty => sub |
| link(f, r) => |
end |
end |
help(cs, empty) |
end |
- If both sides are t-expr’s, then we simply replace one with the other (this is the “variable elimination” case of the Gaussian procedure). We must perform this replacement everywhere: in the remaining terms but also in the substitution already performed.Exercise
What happens if we miss doing this replacement in one or the other?
If one side is a t-expr and the other a t-con, then we have resolved that expression’s type to a concrete type. Record this and substitute.
There are two cases of a t-expr and t-con: for simplicity, we handle one case and in the other case, rewrite the problem to the former case and recur.This swapping of sides is legal because these are equational constraints.
If we have to unify two constructors, then they had better be the same constructor! If they are not, we have a type error. If they are, then we recur on their parameters.
lhs = f.l |
rhs = f.r |
ask: |
| is-t-expr(lhs) and is-t-expr(rhs) then: |
help(subst(lhs, rhs, r), link(f, subst(lhs, rhs, sub))) |
| is-t-expr(lhs) and is-t-con(rhs) then: |
help(subst(lhs, rhs, r), link(f, subst(lhs, rhs, sub))) |
| is-t-con(lhs) and is-t-expr(rhs) then: |
help(link(tyeq(rhs, lhs), r), sub) |
| is-t-con(lhs) and is-t-con(rhs) then: |
if lhs.name == rhs.name: |
help(map2(tyeq, lhs.fields, rhs.fields) + r, sub) |
else: |
raise('type error: ' + lhs.name + ' vs. ' + rhs.name) |
end |
end |
In terms of proving termination, note that the last two cases do not shrink the input: the third keeps it the same, while the fourth in some cases grows it.
fun subst(to-rep :: TyCHS%(is-t-expr), rep-with :: TyCHS, in :: List<TyCon>) -> List<TyCon>: cases (List) in: | empty => empty | link(f, r) => lhs = f.l rhs = f.r link( tyeq( if lhs == to-rep: rep-with else: lhs end, if rhs == to-rep: rep-with else: rhs end), subst(to-rep, rep-with, r)) end end
There is a subtle bug in the above implementation of unification. It assumes that two textually identical expressions must have the same type. Construct a counter-example to show that this is not true. Then fix the implementation (consider using reference rather than structural equality [REF]).
The algorithm above is rather naive. Given a choice, we would rather see the types of identifiers rather than those of expressions. Modify the algorithm to bias in this direction.
The output of the above algorithm is unsatisfying: a set of (solved) constraints rather than an “answer”. Extract the type of the top-level expression, and “pretty-print” it in terms of only type constants, referring to expressions only when necessary (Over- and Under-Constrained Solutions).
Prove the termination of this algorithm. Make an argument based on the size of the constraint set and on the size of the substitution.
Augment this implementation with the occurs check.
Use union-find to optimize this implementation. Measure the performance gain.
With this, we are done. Unification produces a substitution. We can now traverse the substitution and find the types of all the expressions in the program, then insert the type annotations accordingly.
22.3 Type Checking and Type Errors
A theorem, which we will not prove here, dictates that the success of the above process implies that the program would have typed-checked, so we need not explicitly run the type-checker over this program.
Observe, however, that the nature of a type error has now changed
dramatically. Previously, we had a recursive-descent algorithm that
walked a expressions using a type environment. The bindings in the
type environment were programmer-declared types, and could hence be
taken as (intended) authoritative specifications of types. As a
result, any mismatch was blamed on the expressions, and reporting type
errors was simple (and easy to understand). Here, however, a type
error is a failure to notify. The unification failure is based
on events that occur at the confluence of two smart
algorithms—
22.4 Over- and Under-Constrained Solutions
Remember that the constraints may not precisely dictate the type of all variables. If the system of equations is over-constrained, then we get clashes, resulting in type errors. If instead the system is under-constrained, that means we don’t have enough information to make definitive statements about all expressions. For instance, in the expression (fun (x) x) we do not have enough constraints to indicate what the type of x, and hence of the entire expression, must be. This is not an error; it simply means that x is free to be any type at all. In other words, its type is “the type of x -> the type of x” with no other constraints. The types of these underconstrained identifiers are presented as type variables, so the above expression’s type might be reported as (A -> A).
The unification algorithm actually has a wonderful property: it automatically computes the most general types for an expression, also known as principal types. That is, any actual type the expression can have can be obtained by instantiating the inferred type variables with actual types. This is a remarkable result: in another example of computers beating humans, it says that no human can generate a more general type than the above algorithm can!
22.5 Let-Polymorphism
(let ([id (fun (x) x)]) |
(if (id true) |
(id 5) |
(id 6))) |
(if (id<Boolean> true) |
(id<Number> 5) |
(id<Number> 6)) |
The reason for this is because the types we have inferred through unification are not actually polymorphic. This is important to remember: just because you type variables, you don’t necessarily have polymorphism! The type variables could be unified at the next use, at which point you end up with a mere monomorphic function. Rather, true polymorphism only obtains when you can instantiate type variables.
In languages with true polymorphism, then, constraint generation and unification are not enough. Instead, languages like ML and Haskell implement something colloquially called let-polymorphism. In this strategy, when a term with type variables is bound in a lexical context, the type is automatically promoted to be a quantified one. At each use, the term is effectively automatically instantiated.
There are many implementation strategies that will accomplish this. The most naive (and unsatisfying) is to merely copy the code of the bound identifier; thus, each use of id above gets its own copy of (fun (x) x), so each gets its own type variables. The first might get the type (A -> A), the second (B -> B), the third (C -> C), and so on. None of these type variables clash, so we get the effect of polymorphism. Obviously, this not only increases program size, it also does not work in the presence of recursion. However, it gives us insight into a better solution: instead of copying the code, why not just copy the type? Thus at each use, we create a renamed copy of the inferred type: id’s (A -> A) becomes (B -> B) at the first use, and so on, thus achieving the same effect as copying code but without its burdens. Because all these strategies effectively mimic copying code, however, they only work within a lexical context.
23 Mutation: Structures and Variables
23.1 Separating Meaning from Notation
f = 3
o.f = 3
f = 3
Assuming all three are in Java, the first and third could behave exactly like each other or exactly like the second: it all depends on whether f is a local identifier (such as a parameter) or a field of the object (i.e., the code is really this.f = 3).
In either case, we are asking the evaluator to permanently change the value bound to f. This has important implications for other observers. Until now, for a given set of inputs, a computation always returned the same value. Now, the answer depends on when it was invoked: above, it depends on whether it was invoked before or after the value of f was changed. The introduction of time has profound effects on predicting the behavior of programs.
However, there are really two quite different notions of change buried in the uniform syntax above. Changing the value of a field (o.f = 3 or this.f = 3) is extremely different from changing that of an identifier (f = 3 where f is bound as a parameter or a local inside the method, not by the object). We will explore these in turn. We’ll tackle fields below, and return to identifiers in Variables.
To study both these features, we will as usual write interpreters. However, to make sure we expose their essence, we will write these interpreters without the use of state. That is, we will do something quite remarkable: write mutation-free interpreters that faithfully mimic languages with mutation. The key to this will be a special pattern of passing information around in a computation.
23.2 Mutation and Closures
Before we proceed, make sure you’ve understood boxes (A Canonical Mutable Structure), and especially the interaction between mutation and closures (Interaction of Mutation with Closures: Counters).
for(var i = 0; i < 10; i++) { |
button[i] = function() { return i; } |
} |
for(var i = 0; i < 10; i++) { |
println(button[i]()) |
} |
We might have liked this to produce the sequence of values 0, 1, 2, and so on through to 9. In fact, however, it produces ten outputs, all the same: 10.
Do you see why? How would you fix this?
The problem is that i in the for loop is allocated only once. Therefore, all the closures share the same i. Because that value had to become 10 for the for loop to terminate, all the closures report the value 10.
With traditional for loops, there is no obvious way out of this problem. This seemingly confusing behavior often confounds programmers new to languages that make extensive use of closures. Because they cannot change the behavior of for loops, many languages have introduced new versions of for (or new keywords inside for) to address this problem. The solution is always to allocate a fresh i on each iteration, so that each closure is over a different variable; the looping construct copies the previous value of i as the initial value of the new one before applying the updater (in this case, i++) and then performing the comparison and loop body.
funs = for map(i from range(0, 10)): lam(): i end end check: map(lam(c): c() end, funs) is range(0, 10) end
funs = map( lam(i): lam(): i end end, range(0, 10))
23.3 Mutable Structures
Equipped with these examples, let us now return to adding mutation to the language in the form of mutable structures (which are also a good basis for mutable objects [REF]). Besides mutable structures themselves, note that we must sometimes perform mutation in groups (e.g., removing money from one bank account and depositing it in another). Therefore, it is useful to be able to sequence a group of mutable operations. We will call this begin: it evaluates its sub-terms terms in order and returns the value of the last one.
Why does it matter whether begin evaluates its sub-terms in some particular, specified order?
Does it matter what this order is?
Define begin by desugaring into let (and hence into anonymous functions).
This is an excellent illustration of the non-canonical
nature of desguaring. We’ve chosen to add to the core a construct
that is certainly not necessary. If our goal was to shrink the size
of the interpreter—
23.3.1 Extending the Language Representation
data ExprC: | numC(n :: Number) | plusC(l :: ExprC, r :: ExprC) | multC(l :: ExprC, r :: ExprC) | idC(s :: String) | appC(f :: ExprC, a :: ExprC) | fdC(arg :: String, body :: ExprC) | boxC(v :: ExprC) | unboxC(b :: ExprC) | setboxC(b :: ExprC, v :: ExprC) | seqC(b1 :: ExprC, b2 :: ExprC) end
fun make-box-list(): b0 = box(0) b1 = box(1) l = [list: b0, b1] index(l, 0)!{v : 1} index(l, 1)!{v : 2} l where: l = make-box-list() index(l, 0)!v is 1 index(l, 1)!v is 2 end
public static void main (String[] args) { |
Box<Integer> b0 = new Box<Integer>(0); |
Box<Integer> b1 = new Box<Integer>(1); |
|
ArrayList<Box<Integer>> l = new ArrayList<Box<Integer>>(); |
l.add(b0); |
l.add(b1); |
|
l.get(0).set(1); |
l.get(1).set(2); |
} |
For convenience, we will assume that we have implemented desguaring to
provide us with (a) let and (b) if necessary, more than two
terms in a sequence (which can be desugared into nested sequences).
We will also sometimes write expressions in the original Pyret syntax,
both for brevity (because the core language terms can grow quite large
and unwieldy) and so that you can run these same terms in Pyret and
observe what answers they produce. As this implies, we are taking the
behavior in Pyret—
23.3.2 The Interpretation of Boxes
data Value: |
| numV(n :: Number) |
| closV(f :: ExprC%(is-fdC), e :: Env) |
| boxV(v :: Value) |
end |
fun interp(e :: ExprC, nv :: Env) -> Value: |
cases (ExprC) e: |
| numC(n) => numV(n) |
| plusC(l, r) => plus-v(interp(l, nv), interp(r, nv)) |
| multC(l, r) => mult-v(interp(l, nv), interp(r, nv)) |
| idC(s) => lookup(s, nv) |
| fdC(_, _) => closV(e, nv) |
| appC(f, a) => |
clos = interp(f, nv) |
arg-val = interp(a, nv) |
interp(clos.f.body, |
xtnd-env(bind(clos.f.arg, arg-val), clos.e)) |
end |
end |
| boxC(v) => boxV(interp(v, nv)) |
| unboxC(b) => interp(b, nv).v |
Of course, we haven’t done any hard work yet. All the interesting behavior is, presumably, hidden in the treatment of setboxC. It may therefore surprise you that we’re going to look at seqC first instead (and you’ll see why we included it in the core).
| seqC(b1, b2) => |
b1-value = interp(b1, nv) |
b2-value = interp(b2, nv) |
b2-value |
| seqC(b1, b2) => |
interp(b1, nv) |
interp(b2, nv) |
23.3.3 Can the Environment Help?
(let ([b (box 0)]) |
(begin (begin (set-box! b (+ 1 (unbox b))) |
(set-box! b (+ 1 (unbox b)))) |
(unbox b))) |
Represent this expression in ExprC.
(let ([b (box 0)]) |
(+ (begin (set-box! b (+ 1 (unbox b))) |
(unbox b)) |
(begin (set-box! b (+ 1 (unbox b))) |
(unbox b)))) |
If the interpreter is being given precisely the same expression, how
can it possibly avoid producing precisely the same answer? The most
obvious way is if the interpreter’s other parameter, the environment,
were somehow different. As of now the exact same environment is sent
to both both branches of the sequence and both arms of the addition,
so our interpreter—
We must somehow make sure the interpreter is fed different arguments on calls that are expected to potentially produce different results.
We must return from the interpreter some record of the mutations made when evaluating its argument expression.
(+ (let ([b (box 0)]) |
1) |
b) |
Work out the above problem in detail and make sure you understand it.
You could try various other related proposals, but they are likely to all have similar failings. For instance, you may decide that, because the problem has to do with additional bindings in the environment, you will instead remove all added bindings in the returned environment. Sounds attractive? Did you remember we have closures?
Consider the representation of the following program:
(let ([a (box 1)])
(let ([f (fun x (+ x (unbox a)))])
(begin
(set-box! a 2)
(f 10))))
What problems does this example cause?
Rather, we should note that while the constraints described above are all valid, the solution we proposed is not the only one. Observe that neither condition actually requires the environment to be the responsible agent. Indeed, it is quite evident that the environment cannot be the principal agent. We need something else.
23.3.4 Welcome to the Store
The preceding discussion tells us that we need two repositories to accompany the expression, not one. One of them, the environment, continues to be responsible for maintaining lexical scope. But the environment cannot directly map identifiers to their value, because the value might change. Instead, something else needs to be responsible for maintaining the dynamic state of mutated boxes. This latter data structure is called the store.
data Binding: | bind(name :: String, location :: Number) end type Env = List<Binding> mt-env = empty xtnd-env = link data Storage: | cell(location :: Number, value :: Value) end type Store = List<Storage> mt-sto = empty xtnd-sto = link
fun lookup(s :: String, nv :: Env) -> Number: ... fun fetch(n :: Number, st :: Store) -> Value: ...
Fill in the bodies of lookup and fetch.
data Value: |
| numV(n :: Number) |
| closV(f :: ExprC, e :: Env) |
| boxV(l :: Number) |
end |
23.3.5 Interpreting Boxes
fun ret(v :: Value, st :: Store): {v : v, st : st} end
Why do we say “effectively” and “potentially” above?
Hint for “effectively”: look at closures.
fun interp(e :: ExprC, nv :: Env, st :: Store): |
cases (ExprC) e: |
end |
end |
| numC(n) => ret(numV(n), st) |
| fdC(_, _) => ret(closV(e, nv), st) |
| idC(s) => ret(fetch(lookup(s, nv), st), st) |
Now things get interesting.
| seqC(b1, b2) => |
interp(b1, nv, st) |
interp(b2, nv, st) |
| seqC(b1, b2) => |
b1-value = interp(b1, nv, st) |
interp(b2, nv, b1-value.st) |
Spend a moment contemplating the code above. You’ll soon need to adjust your eyes to read this pattern fluently.
| plusC(l, r) => |
lv = interp(l, nv, st) |
rv = interp(r, nv, lv.st) |
ret(plus-v(lv.v, rv.v), rv.st) |
Here’s an important distinction. When we evaluate a term, we usually use the same environment for all its sub-terms in accordance with the scoping rules of the language. The environment thus flows in a recursive-descent pattern. In contrast, the store is threaded: rather than using the same store in all branches, we take the store from one branch and pass it on to the next, and take the result and send it back out. This pattern is called store-passing style.
Now the penny drops. We see that store-passing style is our secret ingredient: it enables the environment to preserve lexical scope while still giving a binding structure that can reflect changes. Our intution told us that the environment had to somehow participate in obtaining different results for the same syntactic expression, and we can now see how it does: not directly, by itself changing, but indirectly, by referring to the store, which updates. Now we only need to see how the store itself “changes”.
new-loc = mk-counter()
| boxC(v) => |
val = interp(v, nv, st) |
loc = new-loc() |
ret(boxV(loc), |
xtnd-sto(cell(loc, val.v), st)) |
Observe that we have relied above on new-loc, which is itself implemented in terms of boxes! This is outright cheating. How would you modify the interpreter so that we no longer need mutation for this little bit of state?
To eliminate new-loc, the simplest option would be to add another parameter to and return value from the interpreter, representing the largest address used so far. Every operation that allocates in the store would return an incremented address, while all others would return it unchanged. In other words, this is precisely another application of the store-passing pattern. Writing the interpreter this way would make it extremely unwieldy and might obscure the more important use of store-passing for the store itself, which is why we have not done so. However, it is important to make sure that we can: that’s what tells us that we are not reliant on state to add state to the language.
| unboxC(b) => |
val = interp(b, nv, st) |
ret(fetch(val.v.l, val.st), val.st) |
Let’s now see how to update the value held in a box. First we have to evaluate the box expression to obtain a box, and the value expression to obtain the new value to store in it. The box’s value is going to be a boxV holding a location.
One is to traverse the store, find the old binding for that location, and replace it with the new one, copying all the other store bindings unchanged.
The other, lazier, option is to simply extend the store with a new binding for that location, which works provided we always obtain the most recent binding for a location (which is how lookup works in the environment, so fetch can do the same in the store).Observe that this latter option forces us to commit to lists rather than to sets.
| setboxC(b, v) => |
b-val = interp(b, nv, st) |
v-val = interp(v, nv, b-val.st) |
ret(v-val.v, |
xtnd-sto(cell(b-val.v.l, v-val.v), v-val.st)) |
Implement the other version of store alteration, whereby we update an existing binding and thereby avoid multiple bindings for a location in the store.
When we look for a location to override the value stored at it, can the location fail to be present? If so, write a program that demonstrates this. If not, explain what invariant of the interpreter prevents this from happening.
| appC(f, a) => |
clos = interp(f, nv, st) |
clos-v :: Value = clos.v |
clos-st :: Store = clos.st |
arg-val = interp(a, nv, clos-st) |
new-loc = new-loc() |
interp(clos-v.f.body, |
xtnd-env(bind(clos-v.f.arg, new-loc), clos-v.e), |
xtnd-sto(cell(new-loc, arg-val.v), arg-val.st)) |
Because we have not said the function parameter is mutable, there is no real need to have implemented procedure calls this way. We could instead have followed the same strategy as before. Indeed, observe that the mutability of this location will never be used: only setboxC changes what’s in an existing store location (the xtnd-sto above is technically a store initialization), and then only when they are referred to by boxVs, but no box is being allocated above.You could call this the useless app store. However, we have chosen to implement application this way for uniformity, and to reduce the number of cases we’d have to handle.
It’s a useful exercise to try to limit the use of store locations only to boxes. How many changes would you need to make?
23.3.6 Implementing Mutation: Subtleties and Variations
Even though we’ve finished the implementation, there are still many subtleties and insights to discuss.
Implicit in our implementation is a subtle and important decision: the order of evaluation. For instance, why did we not implement addition thus?
| plusC(l, r) =>
rv = interp(r, nv, st)
lv = interp(l, nv, rv.st)
ret(plus-v(lv.v, rv.v), lv.st)
It would have been perfectly consistent to do so. Similarly, embodied in the pattern of store-passing is the decision to evaluate the function position before the argument. Observe that:
Previously, we delegated such decisions to the underlying language implementation. Now, store-passing has forced us to sequentialize the computation, and hence make this decision ourselves (whether we realized it or not).
Even more importantly, this decision is now a semantic one. Before there were mutations, one branch of an addition, for instance, could not affect the value produced by the other branch.The only effect they could have was halting with an error or failing to terminate—
which, to be sure, are certainly observable effects, but at a much more gross level. A program would not terminate with two different answers depending on the order of evaluation. Because each branch can have mutations that impact the value of the other, we must choose some order so that programmers can predict what their program is going to do! Being forced to write a store-passing interpreter has made this clear.
Observe that in the application rule, we are passing along the dynamic store, i.e., the one resulting from evaluating both function and argument. This is precisely the opposite of what we said to do with the environment. This distinction is critical. The store is, in effect, “dynamically scoped”, in that it reflects the history of the computation, not its lexical shape. Because we are already using the term “scope” to refer to the bindings of identifiers, however, it would be confusing to say “dynamically scoped” to refer to the store. Instead, we simply say that it is persistent.
Languages sometimes dangerously conflate these two. In C, for instance, values bound to local identifiers are allocated (by default) on the stack. However, the stack matches the environment, and hence disappears upon completion of the call. If the call, however, returned references to any of these values, these references are now pointing to unused or even overridden memory: a genuine source of serious errors in C programs. The problem is that programmers want the values themselves to persist; but the storage for those values has been conflated with that for identifiers, who come and go with lexical scope.
We have already discussed how there are two strategies for overriding the store: to simply extend it (and rely on fetch to extract the newest one) or to “search-and-replace”. The latter strategy has the virtue of not holding on to useless store bindings that will can never be obtained again.
However, this does not cover all the wasted memory. Over time, we cease to be able to access some boxes entirely: e.g., if they are bound to only one identifier, and that identifier is no longer in scope. These locations are called garbage. Thinking more conceptually, garbage locations are those whose elimination does not have any impact on the value produced by a program. There are many strategies for automatically identifying and reclaiming garbage locations, usually called garbage collection [REF].
- It’s very important to evaluate every expression position and thread the store that results from it. Consider, for instance, this alternate implementation of unboxC (compare with <mut-str-interp/unboxC>):
| unboxC(b) =>
val = interp(b, nv, st)
ret(fetch(val.v.l, st), val.st)
Did you notice? We fetched the location from st, not val.st. But st reflects mutations up to but before the evaluation of the unboxC expression, not any within it. Could there possibly be any? Mais oui!(let ([b (box 0)])
(unbox (begin (set-box! b 1)
b)))
With the incorrect code above, this would evaluate to 0 rather than 1. - Here’s another, similar, error (again compare with <mut-str-interp/unboxC>):
| unboxC(b) =>
val = interp(b, nv, st)
ret(fetch(val.v.l, val.st), st)
How do we break this? In the end we’re returning the old store, the one before any mutations in the unboxC happened. Thus, we just need the outside context to depend on one of them.(let ([b (box 0)])
(+ (unbox (begin (set-box! b 1)
b))
(unbox b)))
This should evaluate to 2, but because the store being returned is one where b’s location is bound to the representation of 0, the result is 1.If we combined both bugs above—
i.e., using st twice in the last line instead of s-a twice— this expression would evaluate to 0 rather than 2. ExerciseGo through the interpreter; replace every reference to an updated store with a reference to one before update; make sure your test cases catch all the introduced errors!
Observe that these uses of “old” stores enable us to perform a kind of time travel: because mutation introduces a notion of time, these enable us to go back in time to when the mutation had not yet occurred. This sounds both interesting and perverse; does it have any use?
It does! Imagine that instead of directly mutating the store, we introduce the idea of a journal of intended updates to the store. The journal flows in a threaded manner just like the real store itself. Some instruction creates a new journal; after that, all lookups first check the journal, and only if the journal cannot find a binding for a location is it looked for in the actual store. There are two other new instructions: one to discard the journal (i.e., travel back in time), and the other to commit it (i.e., all of its edits get applied to the real store).
This is the essence of software transactional memory. Each thread maintains its own journal. Thus, one thread does not see the edits made by the other before committing (because each thread sees only its own journal and the global store, but not the journals of other threads). At the same time, each thread gets its own consistent view of the world (it sees edits it made, because these are recorded in the journal). If the transaction ends successfully, all threads atomically see the updated global store. If the transaction aborts, the discarded journal takes with it all changes and the state of the thread reverts (modulo global changes committed by other threads).
Software transactional memory offers one of the most sensible approaches to tackling the difficulties of multi-threaded programming, if we insist on programming with shared mutable state. Because most computers have only one global store, however, maintaining the journals can be expensive, and much effort goes into optimizing them. As an alternative, some hardware architectures have begun to provide direct support for transactional memory by making the creation, maintenance, and commitment of journals as efficient as using the global store, removing one important barrier to the adoption of this idea.
ExerciseAugment the language with the journal features of software transactional memory.
An alternate implementation strategy is to have the environment map names to boxed Values. We don’t do it here because it: (a) would be cheating, (b) wouldn’t tell us how to implement the same feature in a language without boxes, (c) doesn’t necessarily carry over to other mutation operations, and (d) most of all, doesn’t really give us insight into what is happening here.
It is nevertheless useful to understand, not least because you may find it a useful strategy to adopt when implementing your own language. Therefore, alter the implementation to obey this strategy. Do you still need store-passing style? Why or why not?
23.4 Variables
Now that we’ve got structure mutation worked out, let’s consider the other case: variable mutation. We have already discussed (From Identifiers to Variables) our choice of terminology, and seen examplse of their use in Pyret. In particular, Whereas other languages overload the mutation syntax, as we have seen (Separating Meaning from Notation), in Pyret they are kept distinct: ! mutates fields of objects while := mutates variables. This forces Pyret programmers to confront the distinction we introduced at the beginning of Separating Meaning from Notation. We will, of course, sidestep these syntactic issues in our core language by using different constructs for boxes and for variables.
23.4.1 The Syntax of Variable Assignment
x = 3; |
1 = 3; |
o = new String("an initial string"); |
o = new String("a new string"); |
23.4.2 Interpreting Variables
data ExprC: | numC(n :: Number) | plusC(l :: ExprC, r :: ExprC) | multC(l :: ExprC, r :: ExprC) | varC(s :: String) | appC(f :: ExprC, a :: ExprC) | fdC(arg :: String, body :: ExprC) | setC(v :: String, b :: ExprC) | seqC(b1 :: ExprC, b2 :: ExprC) end
data Value: | numV (n :: Number) | closV (f :: ExprC, e :: List<Binding>) end
As you might imagine, to support variables we need the same store-passing style that we’ve seen before (Interpreting Boxes), and for the same reasons. What differs is in precisely how we use it. Because sequencing is interpreted in just the same way (observe that the verb for it does not depend on boxes versus variables), that leaves us just the variable mutation case to handle.
| setC(v, b) => |
new-val = interp(b, nv, st) |
var-loc = lookup(v, nv) |
ret(new-val.v, |
xtnd-sto(cell(var-loc, new-val.v), new-val.st)) |
And we’re done! We did all the hard work when we implemented store-passing style (and also in that application allocated new locations for variables).
23.4.3 Reference Parameter Passing
Let’s return to the parenthetical statement above: that every application allocates a fresh location in the store for the parameter.
Why does this matter? Consider the following Pyret program:fun f(x): x := 3 end var y = 5 f(y)
After this runs, what do we expect to be the value of y?
In the example above, y evaluates to 5, not 3. That is because the value of the formal parameter x is held at a different location than that of the actual parameter y, so the mutation affects the location of x, leaving y unscathed.
Now suppose, instead, that application behaved as follows. When the actual parameter is a variable, and hence has a location in memory, instead of allocating a new location for the value, it simply passes along the existing one for the variable. Now the formal parameter is referring to the same store location as the actual: i.e., they are variable aliases. Thus any mutation on the formal will leak back out into the calling context; the above program would evaluate to 3 rather than 5. These is called a call-by-reference parameter-passing strategy.Instead, our interpreter implements call-by-value, and this is the same strategy followed by languages like Java. This causes confusion because when the value is itself mutable, changes made to the value in the callee are observed by the caller. However, that is simply an artifact of mutable values, not of the calling strategy. Please avoid this confusion!
For some years, this power was considered a good idea. It was useful because programmers could write abstractions such as swap, which swaps the value of two variables in the caller. However, the disadvantages greatly outweigh the advantages:
A careless programmer can alias a variable in the caller and modify it without realizing they have done so, and the caller may not even realize this has happened until some obscure condition triggers it.
Some people thought this was necessary for efficiency: they assumed the alternative was to copy large data structures. However, call-by-value is compatible with passing just the address of the data structure. You only need make a copy if (a) the data structure is mutable, (b) you do not want the caller to be able to mutate it, and (c) the language does not itself provide immutability annotations or other mechanisms.
- It can force non-local and hence non-modular reasoning. For instance, suppose we have the procedure:
fun f(g): var x = 10 g(x) ... end
If the language were to permit by-reference parameter passing, then the programmer cannot locally—i.e., just from the above code— determine what the value of x will be in the ellipses, because it depends on precisely who the callee (which is being passed in as a parameter) will be, and what it might do, which in turn may depend on dynamic conditions (including the phase of the moon).
At the very least, then, if the language is going to permit
by-reference parameters, it should let the caller determine
whether to pass the reference—
At some point, therefore, we should consider whether any of this fuss
is worthwhile. Instead, callers who want the callee to perform a
mutation could simply send a boxed value to the callee. The box
signals that the caller accepts—
23.5 The Design of Stateful Language Operations
Though most programming languages include one or both kinds of state we have studied, their admission should not be regarded as a trivial or foregone matter. On the one hand, state brings some vital benefits:
State provides a form of modularity. As our very interpreter demonstrates, without explicit stateful operations, to achieve the same effect:
We would need to add explicit parameters and return values that pass the equivalent of the store around.
These changes would have to be made to all procedures that may be involved in a communication path between producers and consumers of state.
Thus, a different way to think of state in a programming language is that it is an implicit parameter already passed to and returned from all procedures, without imposing that burden on the programmer. This enables procedures to communicate “at a distance” without all the intermediaries having to be aware of the communication.
State makes it possible to construct dynamic, cyclic data structures, or at least to do so in a relatively straightforward manner (Graphs).
State gives procedures memory, such as new-loc above. If a procedure could not remember things for itself, the callers would need to perform the remembering on its behalf, employing the moral equivalent of (at least local) store-passing. This is not only unwieldy, it creates the potential for a caller to interfere with the memory for its own nefarious purposes (e.g., a caller might purposely send back an old store, thereby obtaining a reference already granted to some other party, through which it might launch a correctness or security attack).
On the other hand, state imposes real costs on programmers as well as on programs that process programs (such as compilers). One is “aliasing”, which we discuss later [REF]. Another is “referential transparency”, which too I hope to return to [REF]. Finally, we have described above how state provides a form of modularity. However, this same description could be viewed as that of a back-channel of communication that the intermediaries did not know and could not monitor. In some (especially security and distributed system) settings, such back-channels can lead to collusion, and can hence be extremely dangerous and undesirable.
Because there is no optimal answer, it is probably wise to include mutation operators but to carefully delinate them. In Standard ML, for instance, there is no variable mutation, because it is considered unnecessary. Instead, the language has the equivalent of boxes (called refs). One can easily simulate variables using boxes, so no expressive power is lost, though it does create more potential for aliasing than variables alone would have ([REF aliasing]) if the boxes are not used carefully.
In return, however, developers obtain expressive types: every data structure is considered immutable unless it contains a ref, and the presence of a ref is a warning to both developers and programs (such as compilers) that the underlying value may keep changing.This same argument applies to Pyret, where the absence of a ref declaration means that a field is immutable, and the absence of a var declaration means an identifier is immutable, i.e., not a variable. Thus, for instance, suppose b is a box and v is bound to the unboxing of b. A developer should be aware that replacing all instances of the unboxing b with references to v is not safe, because the former always fetches the current value in the box while the latter holds the value only at the instant when v was computed, and may now be inconsistent. The declaration that a field is mutable provides this information to both the developer and to programming tools (such as compilers); in particular, the absence of such a declaration permits caching of values, thereby trading computation time for space.
23.6 Typing State
Adding stateful operations to a type-checker is easy: the only safe thing to do is make sure the type of the new value is exactly the same as that of the old one. If that is true, the behavior of the program will be indistinguishable to the type system before and after the mutation. That is, it is safest to follow invariance (The Principle of Substitutability).
23.6.1 Mutation and Polymorphism
box<T> :: T -> Box(T) unbox<T> :: Box(T) -> T set-box<T> :: Box(T), T -> Box(T)
Implement the above three functions.
let f = box(lam(x): x end): set-box(f, lam(x): x + 5 end) unbox(f)(true) end
There are many ways to try to understand this problem, which is beyond
the scope of this study. The simplest way is that polymorphism and
mutation do not work together well. The essence of polymorphism is to
imagine a quantified type is instantiated at each use; however,
the essence of mutation is to silently transmit values from one part
of the program to the other. Thus, the values being unified at two
different sites are only guaranteed to be compatible with the
let-bound identifier—
23.6.2 Typing the Initial Value
Using a fixed initial value of a standard type means the value subsequently mutated into place may not be type-compatible, thereby failing invariance.
Using a different initial value of the type that will eventually be put into the mutable has the problem that prematurely observing it is even more deadly, because it may not be distinguishable from the eventual value.
Using a new value just for this case works provided there is one of each type. Otherwise, again, we violate invariance. But having one of each type is a problem in itself, because now the run-time system has to check for all of them.
Syntactically restricting recursion to functions is the safest, because the initial value is never seen. As a result, there is no need to provide any meaningful type for it.
24 Objects: Interpretation and Types
When a language admits functions as values, it provides developers the
most natural way to represent a unit of computation. Suppose a
developer wants to parameterize some function f. Any language
lets f be parameterized by passive data, such as numbers
and strings. But it is often attractive to parameterize it over
active data: a datum that can compute an answer, perhaps
in response to some information. Furthermore, the function passed to
f can—
While a function is a splendid thing, it suffers from excessive terseness. Sometimes we might want multiple functions to all close over to the same shared data; the sharing especially matters if some of the functions mutate it and expect the others to see the result of those mutations. In such cases, it becomes unwieldly to send just a single function as a parameter; it is more useful to send a group of functions. The recipient then needs a way to choose between the different functions in the group. This grouping of functions, and the means to select one from the group, is the essence of an object. We are therefore perfectly placed to study objects having covered functions (Functions Anywhere), mutation (Mutation: Structures and Variables), and recursion (Recursion and Cycles from Mutation).
I cannot hope to do justice to the enormous space of object systems. Please read Object-Oriented Programming Languages: Application and Interpretation by Éric Tanter, which goes into more detail and covers topics ignored here. Let’s add this notion of objects to our language. Then we’ll flesh it out and grow it, and explore the many dimensions in the design space of objects. We’ll first show how to add objects to the core language, but because we’ll want to prototype many different ideas quickly, we’ll soon shift to a desguaring-based strategy. Which one you use depends on whether you think understanding them is critical to understanding the essence of your language. One way to measure this is how complex your desguaring strategy becomes, and whether by adding some key core language enhancements, you can greatly reduce the complexity of desugaring.
24.1 Interpreting Objects
a value, that
maps names to
stuff: either other values or “methods”.
data Value: | numV(n :: Number) | closV(f :: ExprC, e :: List<Binding>) | objV(ns :: List<String>, vs :: List<Value>) end
| objC(ns :: List<String>, vs :: List<ExprC>)
| objC(ns, vs) => obj-vs = eval-obj-vs(vs, nv, st) ret(objV(ns, obj-vs.exprs), obj-vs.final-store)
Write eval-obj-vs, which evaluates each expression in vs while threading the store. Assume it returns an object with two fields: exprs is the list of evaluated expressions, while final-store is the final store ensuing from these evaluations.
| msgC(o :: ExprC, n :: String) |
| msgC(o, n) => o-val = interp(o, nv, st) msg = lookup-msg(n, o-val.v) ret(msg, o-val.st)
Implement lookup-msg.
In principle, msgC can be used to obtain any kind of
member but for simplicity, we need only assume that we have
functions. To use them, we must apply them to values. This is
cumbersome to write directly, so let’s assume desugaring
has taken care of it for us: that is, the user can write
(msg o m v)—
(let o (obj (add1 (lambda x (+ x 1))) |
(sub1 (lambda x (+ x -1)))) |
(msg o sub1 2)) |
24.2 Objects by Desugaring
While defining objects in the core language is good to really understand their essence, it’s an unwieldy way to go about studying them. Instead, we’ll use Pyret to represent objects, sticking to the parts of the language we already know how to implement in our interpreter. That is, we’ll assume that we are looking at the output of desugaring. (For this reason, we’ll also stick to stylized code, potentially writing unnecessary expressions on the grounds that this is what a simple program generator would produce.)
The code that follows largely drops type annotations. Go back in and add these annotations wherever possible; where you can’t, explain what problems you encounter. See Types for Objects.
24.2.1 Objects as Named Collections
o-1 = |
lam(m): |
if m == "add1": |
lam(x): x + 1 end |
else if m == "sub1": |
lam(x): x - 1 end |
else: |
raise("message not found: " + m) |
end |
end |
check: o-1("add1")(5) is 6 end
fun msg(o, m, a): o(m)(a) end
check: msg(o-1, "add1", 5) is 6 end
Something very important changed when we switched to the desguaring strategy. Do you see what it is?
check: msg(o-1, "add" + "1", 5) is 6 end
This is a general problem with desugaring: the target language may allow computations that have no counterpart in the source, and hence cannot be mapped back to it. Fortunately we don’t often need to perform this inverse mapping, though it does arise in some debugging and program comprehension tools. More subtly, however, we must ensure that the target language does not produce values that have no corresponding equivalent in the source.
o-1-1 = mk-object( [list: mtd("add1", lam(x): x + 1 end), mtd("sub1", lam(x): x - 1 end) ] )
data Mtd: | mtd(name :: String, value) end
fun mk-object(n-vs): lam(m): fun lookup(locals): cases (List) locals: | empty => raise("message not found: " + m) | link(f, r) => if f.name == m: f.value else: lookup(r) end end end lookup(n-vs) end end
With this much simpler notation—
24.2.2 Constructors
o-constr-1 = lam(x): mk-object( [list: mtd("addX", lam(y): x + y end) ]) end check: msg(o-constr-1(5), "addX", 3) is 8 msg(o-constr-1(2), "addX", 3) is 5 end
24.2.3 State
o-state-1 = lam(count): var mut-count = count mk-object( [list: mtd("inc", lam(n): mut-count := mut-count + n end), mtd("dec", lam(n): mut-count := mut-count - n end), mtd("get", lam(_): mut-count end) ] ) end
check: o = o-state-1(5) msg(o, "inc", 1) msg(o, "dec", 1) msg(o, "get", "dummy") is 5 end
check: o1 = o-state-1(5) o2 = o-state-1(5) msg(o1, "inc", 1) msg(o1, "inc", 1) msg(o1, "get", "dummy") is 7 msg(o2, "get", "dummy") is 5 end
24.2.4 Private Members
Another common object language feature is private members: ones that are visible only inside the object, not outside it.Except that, in Java, instances of other classes of the same type are privy to “private” members. Otherwise, you would simply never be able to implement an approximation to an Abstract Data Type. These may seem like an additional feature we need to implement, but we already have the necessary mechanism in the form of locally-scoped, lexically-bound variables, such as mut-count above: there is no way for surrounding code to access mut-count directly, because lexical scoping ensures that it remains hidden to the world.
24.2.5 Static Members
mk-bank-account = block: var counter = 0 lam(amount): var balance = amount counter := counter + 1 mk-object( [list: mtd("deposit", lam(m): balance := balance + m end), mtd("withdraw", lam(m): balance := balance - m end), mtd("balance", lam(_): balance end), mtd("how-many-accounts", lam(_): counter end) ]) end end
check: acc-1 = mk-bank-account(0) msg(acc-1, "how-many-accounts", "dummy") is 1 acc-2 = mk-bank-account(100) msg(acc-1, "how-many-accounts", "dummy") is 2 msg(acc-2, "how-many-accounts", "dummy") is 2 msg(acc-1, "deposit", 100) msg(acc-1, "withdraw", 50) msg(acc-2, "deposit", 10) msg(acc-1, "balance", "dummy") is 50 msg(acc-2, "balance", "dummy") is 110 msg(acc-1, "how-many-accounts", "dummy") is 2 msg(acc-2, "how-many-accounts", "dummy") is 2 end
24.2.6 Objects with Self-Reference
Until now, our objects have simply been packages of named functions
grouped together and hence given different, named entry-points. We’ve
seen that many of the features considered important in object systems
are actually simple patterns over functions and scope, and have indeed
been used—
One characteristic that actually distinguishes object systems is that each object is automatically equipped with a reference to the same object, often called self or this.I prefer this slightly dry way of putting it to the anthropomorphic “knows about itself” terminology often adopted by object advocates. Indeed, note that we have gotten this far into object system properties without ever needing to resort to anthropomorphism. Can we implement this easily?
24.2.6.1 Self-Reference Using Mutation
o-self = block: var self = "dummy" self := mk-object( [list: mtd("first", lam(v): msg(self, "second", v + 1) end), mtd("second", lam(v): v + 1 end )]) self end
check: msg(o-self, "first", 5) is 7 end
24.2.6.2 Self-Reference Without Mutation
o-self-no-state = mk-object( [list: mtd("first", lam(self, v): smsg(self, "second", v + 1) end), mtd("second", lam(self, v): v + 1 end )])
fun smsg(o, m, a): o(m)(o, a) end
check: smsg(o-self, "first", 5) is 7 end
24.2.7 Dynamic Dispatch
Finally, we should make sure our objects can handle a characteristic
attribute of object systems, which is the ability to invoke a method
without the caller having to know or decide which object will handle
the invocation. Suppose we have a binary tree data structure, where a
tree consists of either empty nodes or leaves that hold a value. In
traditional functions, we are forced to implement the equivalent some
form of conditional that exhaustively lists and selects between the
different kinds of trees. If the definition of a tree grows to
include new kinds of trees, each of these code fragments must be
modified. Dynamic dispatch solves this problem by eliminating this
conditional branch from the user’s program and instead
handling it by the method selection code built into the language.
The key feature that this provides is an extensible conditional.
This is one dimension of the extensibility that objects
provide.This property—
mt = lam(): mk-object( [list: mtd("add", lam(self, _): 0 end) ]) end node = lam(v, l, r): mk-object( [list: mtd("add", lam(self, _): v + smsg(l, "add", "dummy") + smsg(r, "add", "dummy") end) ] ) end
a-tree = node(10, node(5, mt(), mt()), node(15, node(6, mt(), mt()), mt()))
check: smsg(a-tree, "add", "dummy") is (10 + 5 + 15 + 6) end
24.3 Member Access Design Space
Name is Static | Name is Computed | |
Fixed Set of Members | As in base Java. | As in Java with reflection to compute the name. |
Variable Set of Members | Difficult to envision (what use would it be?). | Most scripting languages. |
The lower-right quadrant corresponds closely with languages that use hash-tables to represent objects. Then the name is simply the index into the hash-table. Some languages carry this to an extreme and use the same representation even for numeric indices, thereby (for instance) conflating objects with dictionaries and even arrays. Even when the object only handles “member names”, this style of object creates significant difficulty for type-checking [REF] and is hence not automatically desirable.
Therefore, in the rest of this section, we will stick with “traditional” objects that have a fixed set of names and even static member name references (the top-left quadrant). Even then, we will find there is much, much more to study.
24.4 What (Goes In) Else?
So far, the “else clause” of method lookup (which is currently
implemented by mk-object)—
Let’s return to our model of desugared objects above. To implement inheritance, the object must be given “something” to which it can delegate method invocations that it does not recognize. A great deal will depend on what that “something” is.
| empty => raise("message not found: " + m)
| empty => parent-object(m)
Observe that the application parent-object(m) is like “half a msg”, just like an l-value was “half” a variable’s evaluation (Interpreting Variables). Is there any connection?
Let’s try this by extending our trees to implement another method, "size". We’ll write an “extension” (you may be tempted to say “sub-class”, but hold off for now!) for each node and mt to implement the size method. We intend these to extend the existing definitions of node and mt, so we’ll use the extension pattern described above.We’re not editing the existing definitions because that is supposed to be the whole point of object inheritance: to reuse code in a black-box fashion. This also means different parties, who do not know one another, can each extend the same base code. If they had to edit the base, first they have to find out about each other, and in addition, one might dislike the edits of the other. Inheritance is meant to sidestep these issues entirely.
24.4.1 Classes
node-size-ext = fun(parent-object, v, l, r): ...
node-size-ext = lam(parent-maker, v, l, r): parent-object = parent-maker(v, l, r) mk-ext-object(parent-object, [list: mtd("size", lam(self, _): 1 + smsg(l, "size", "dummy") + smsg(r, "size", "dummy") end) ] ) end
fun node-size(v, l, r): node-size-ext(node, v, l, r) end
Did you notice that instead of mk-object we’ve used mk-ext-object above? Do you see that it takes one extra parameter? Try to define it for yourself.
fun mk-ext-object(parent, n-vs): lam(m): fun lookup(locals): cases (List) locals: | empty => parent(m) | link(f, r) => if f.name == m: f.value else: lookup(r) end end end lookup(n-vs) end end
mt-size-ext = lam(parent-maker): parent-object = parent-maker() mk-ext-object(parent-object, [list: mtd("size", lam(self, _): 0 end) ]) end fun mt-size(): mt-size-ext(mt) end
a-tree-size = node-size(10, node-size(5, mt-size(), mt-size()), node-size(15, node-size(6, mt-size(), mt-size()), mt-size()))
check: smsg(a-tree-size, "add", "dummy") is (10 + 5 + 15 + 6) smsg(a-tree-size, "size", "dummy") is 4 end
Earlier, we commented that chaining method-lookup to parents presumably bottoms out at some sort of “empty object”, which might look like this:fun empty-object(m): raise("message not found: " + m) end
However, we haven’t needed to define or use this despite the use of mk-ext-object. Why is that, and how would you fix that?
class NodeSize extends Node { ... }
So why are we going out of the way to not call it a “class”?
When a developer invokes a Java class’s constructor, it in effect
constructs objects all the way up the inheritance chain (in practice,
a compiler might optimize this to require only one constructor
invocation and one object allocation). These are private copies of
the objects corresponding to the parent classes (private, that is,
up to the presence of static members). There is, however, a question
of how much of these objects is visible. Java chooses that—
In the implementation above, we have relied on the self-application semantics for recursive access to an object, rather than using state. The reason is because the behavior of inheritance would be subtly wrong if we used state naively, as we have shown above. Can you construct an example that illustrates this?
By examining values carefully, you will notice that the self reference is to the most refined object at all times. This demonstrates the other form of extensibility we get from traditional objects: extensible recursion. The extensible conditional can be viewed as free extension across “space”, namely, the different variants of data, whereas extensible recursion can be viewed as free extension across “time”, namely, the different extensions to the code. Nevertheless, as this paper points out, there’s no free lunch.
24.4.2 Prototypes
In our description above, we’ve supplied each class with a description of its parent class. Object construction then makes instances of each as it goes up the inheritance chain. There is another way to think of the parent: not as a class to be instantiated but, instead, directly as an object itself. Then all children with the same parent would observe the very same object, which means changes to it from one child object would be visible to another child. The shared parent object is known as a prototype.
The archetypal prototype-based language is
Self.
Though you may have read that languages like JavaScript are “based
on” Self, there is value to studying the idea from its source,
especially because Self presents these ideas in their purest form.
Some language designers have argued that prototypes are more primitive
than classes in that, with other basic mechanisms such as functions,
one can recover classes from prototypes—
Modify the inheritance pattern above to implement a Self-like, prototype-based language, instead of a class-based language. Because classes provide each object with distinct copies of their parent objects, a prototype-language might provide a clone operation to simplify creation of the operation that simulates classes atop prototypes.
24.4.3 Multiple Inheritance
Now you might ask, why is there only one fall-through option?
It’s easy to generalize this to there being many, which leads naturally
to multiple inheritance. In effect, we have multiple objects to
which we can chain the lookup, which of course raises the question of
what order in which we should do so. It would be bad enough if the
ascendants were arranged in a tree, because even a tree does not have
a canonical order of traversal: take just breadth-first and
depth-first traversal, for instance (each of which has compelling
uses). Worse, suppose a blob A extends B and C;
but now suppose B and C each extend
D.This infamous situation is called
diamond inheritance. If you choose to include multiple
inheritance in your language you can lose yourself for days in design
decisions on this. Because it is highly unlikely you will find a
canonical answer, your pain will have only begun. Now we have to
confront this question: will there be one or two D objects in
the instance of A? Having only one saves space and might
interact better with our expectations, but then, will we visit this
object once or twice? Visiting it twice should not make any
difference, so it seems unnecessary. But visiting it once means the
behavior of one of B or C might change. And so on. As
a result, virtually every multiple-inheritance language is accompanied
by a subtle algorithm merely to define the lookup order—
Multiple inheritance is only attractive until you’ve thought it through.
24.4.4 Super-Duper!
Many languages have a notion of super-invocations, i.e., the ability to invoke a method or access a field higher up in the inheritance chain.Note that I say “the” and “chain”. When we switch to multiple inheritance, these concepts are replaced with something much more complex. This includes doing so at the point of object construction, where there is often a requirement that all constructors be invoked, to make sure the object is properly defined.
We have become so accustomed to thinking of these calls as going “up” the chain that we may have forgotten to ask whether this is the most natural direction. Keep in mind that constructors and methods are expected to enforce invariants. Whom should we trust more: the super-class or the sub-class? One argument would say that the sub-class is most refined, so it has the most global view of the object. Conversely, each super-class has a vested interest in protecting its invariants against violation by ignorant sub-classes.
These are two fundamentally opposed views of what inheritance means. Going up the chain means we view the extension as replacing the parent. Going down the chain means we view the extension as refining the parent. Because we normally associate sub-classing with refinement, why do our languages choose the “wrong” order of calling? Some languages have, therefore, explored invocation in the downward direction by default.gbeta is a modern programming language that supports inner, as well as many other interesting features. It is also interesting to consider combining both directions.
24.4.5 Mixins and Traits
Let’s return to our “blobs”.
When we write a class in Java, what are we really defining between the opening and closing braces? It is not the entire class: that depends on the parent that it extends, and so on recursively. Rather, what we define inside the braces is a class extension. It only becomes a full-blown class because we also identify the parent class in the same place.
class C extends B { ... } |
classext E { ... } |
class C = E(B); |
class C1 = E(B1); |
class C2 = E(B2); |
Mixins make class definition more compositional. They provide many of the benefits of multiple-inheritance (reusing multiple fragments of functionality) but within the aegis of a single-inheritance language (i.e., no complicated rules about lookup order). Observe that when desugaring, it’s actually quite easy to add mixins to the language. A mixin is primarily a “function over classes”; because we have already determined how to desugar classes, and our target language for desugaring also has functions, and classes desugar to expressions that can be nested inside functions, it becomes almost trivial to implement a simple model of mixins.This is a case where the greater generality of the target language of desugaring can lead us to a better construct, if we reflect it back into the source language.
In a typed language, a good design for mixins can actually improve
object-oriented programming practice. Suppose we’re defining a
mixin-based version of Java. If a mixin is effectively a
class-to-class function, what is the “type” of this “function”?
Clearly, a mixin ought to use interfaces to describe what it
expects and provides. Java already enables (but does not require) the
latter, but it does not enable the former: a class (extension) extends
another class—
mixin M extends I { ... } |
A good design for mixins can go even further. A class can only be used once in an inheritance chain, by definition (if a class eventually referred back to itself, there would be a cycle in the inheritance chain, causing potential infinite loops). In contrast, when we compose functions, we have no qualms about using the same function twice (e.g.: (map ... (filter ... (map ...)))). Is there value to using a mixin twice?There certainly is! See sections 3 and 4 of Classes and Mixins.
Mixins solve an important problem that arises in the design of libraries. Suppose we have a dozen features that can be combined in different ways. How many classes should we provide? It is obviously impractical to generate the entire combinatorial explosion of classes. It would be better if the devleoper could pick and choose the features they care about. This is precisely the problem that mixins solve: they provide class extensions that the developers can combine, in an interface-preserving way, to create just the classes they need.Mixins are used extensively in the Racket GUI library. For instance, color:text-mixin consumes basic text editor interfaces and implements the colored text editor interface. The latter is iself a basic text editor interface, so additional basic text mixins can be applied to the result.
How does your favorite object-oriented library solve this problem?
Mixins do have one limitation: they enforce a linearity of composition. This strictness is sometimes misplaced, because it puts a burden on programmers that may not be necessary. A generalization of mixins called traits says that instead of extending a single mixin, we can extend a set of them. Of course, the moment we extend more than one, we must again contend with potential name-clashes. Thus traits must be equipped with mechanisms for resolving name clashes, often in the form of some name-combination algebra. Traits thus offer a nice complement to mixins, enabling programmers to choose the mechanism that best fits their needs. A handful of languages, such as Racket, therefore provide both traits and mixins.
24.5 Object Classification and Object Equality
Previously [A Family of Equality Predicates], we have seen three different kinds of equality operations. For the purpose of this discussion, we will ignore the distinction between equal-now and equal-always, focusing on the fact that both are primarily structural (equal-now being purely so). Extended to objects, this would check each member recursively, perhaps ignoring methods in languages that cannot compare them for equality, or comparing them using reference equality.
This leaves us with the very fine-grained and unforgiving identical, and the very coarse-grained and perhaps overly forgiving equal-now. Why is structural equality overly forgiving? Because two completely unrelated objects that just happened to have the same member names and types could end up being regarded equal: as a famous example in the objects community has it, draw is a meaningful method of both user interfaces and cowhands.
Therefore, some systems provide an equality predicate “in the middle”: it is still fundamentally structural, but it discriminates between objects that were not “made the same way”. The typical notion of construction is associated with a class: all objects made from a certain class are considered to be candidates for (structural) equality, but objects made from different classes (for some notion of “different”) are immediately ruled unequal independent of their structure (which may in fact be identical).
In the special case where classes are named, first-order entities,
this is called nominal equality: an equality based on
names. However, it does not have to depend on names, nor even on
first-order classes. Some languages have dynamic tag creators—
24.6 Types for Objects
Having studied various programming mechanisms, we now turn our focus to types for them. First (Subtyping) we will relax the notion of invariance for substitutability (The Principle of Substitutability). Then, we will discuss how new notions of equality (Object Classification and Object Equality) can impact subtyping to create a new class of types (Nominal Types).
24.6.1 Subtyping
type Add1Sub1 = { add1 :: (Number -> Number), sub1 :: (Number -> Number) }
type Arith = { add1 :: (Number -> Number), sub1 :: (Number -> Number), plus :: (Number, Number -> Number), mult :: (Number, Number -> Number) }
fun f(a :: Arith) -> Number: a.plus(2, 3) end
But how about in the other direction? This is entirely reasonable: the
context is expecting a Add1Sub—
This is our first example of subtyping. We say that
Arith is a subtype of Add1Sub1 because we can supply an
Arith value in any context that expected a Add1Sub1
value. Specifically, because this involves dropping some
members—
The essence of subtyping is a relation, conventionally written as <:, between pairs of types. We say S <: T if a value of type S can be given where a value of type T is expected, and call S the subtype and T the supertype. Therefore, in the above example, Arith <: Add1Sub1 and Arith is a subtype of Add1Sub1.Later [Nominal Types], we will talk about how subtypes correspond to subclasses. But for now observe that we’re talking only about objects, without any reference to the existence of classes. It is useful (and usually accurate) to take a subset interpretation: if the values of S are a subset of T, then an expression expecting T values will not be unpleasantly surprised to receive only S values.
Why is subtyping a relation and not a function?
{ add1 : (Number -> Number), { add1 : (Number -> Number), sub1 : (Number -> Number), <: sub1 : (Number -> Number) } plus : (Number, Number -> Number), mult : (Number, Number -> Number) }
To understand why this is sound, it helps to develop the intuition that the “larger” the type, the fewer values it can have. Every object that has the four members on the left clearly also has the two members on the right. However, there are many objects that have the two members on the right that fail to have all four on the left. If we think of a type as a constraint on acceptable value shapes, the “bigger” type imposes more constraints and hence admits fewer values. Thus, though the types may appear to be of the wrong sizes, everything is well because the sets of values they subscribe are of the expected sizes.
As you might expect, there is another important form of subtyping, which is within a given member. This simply says that any particular member can be subsumed to a supertype in its corresponding position. For obvious reasons, this form is called depth subtyping.
Construct two examples of depth subtyping. In one, give the field itself an object type, and use width subtyping to subtype that field. In the other, give the field a function type.
The combination of width and depth subtyping cover the most interesting cases of object subtyping. A type system that implemented only these two would, however, needlessly annoy programmers. Other convenient rules include the ability to permute names, reflexivity (every type is a subtype of itself, which gives us invariance for free, and lets us interpret the subtype relationship as subset), and transitivity.
Subtyping has a pervasive effect on the type system. We have to
reexamine every kind of type and understand its interaction with
subtyping. For base types, this is usually quite obvious: disjoint
types like Number, String, etc., are all unrelated to
each other. (In languages where one base type is used to represent
another—
In fact, even our very diction about types has to change. Suppose we have an expression of type T. Normally, we would say that it produces values of type T. Now, we should be careful to say that it produces values of up to or at most T, because it may only produce values of a subtype of T. Thus every reference to a type should implicitly be cloaked in a reference to the potential for subtyping. To avoid pestering you I will refrain from doing this, but be wary that it is possible to make reasoning errors by not keeping this implicit interpretation in mind.
24.6.1.1 Subtyping Functions
Our examples above have been carefully chosen to mask an important detail: the subtyping of functions. To understand this, we will build up an example.
fun b2n(b :: Boolean01) -> Number: if b == 0: # alias for false 1 else if b == 1: # alias for true 0 else: raise('not valid number as Boolean01') end end fun n2b(n :: Number) -> Boolean01: if n == 0: false # alias for 0 else if n == 1: true # alias for 1 else: raise('no valid Boolean01 for number') end end fun n2n(n :: Number) -> Number: n + 1 end fun b2b(b :: Boolean01) -> Boolean01: if b == 0: # alias for false true # alias for 1 else if b == 1: # alias for true false # alias 0 else: raise('not valid number as Boolean01') end end
type N2N = (Number -> Number) type B2B = (Boolean01 -> Boolean01) type N2B = (Number -> Boolean01) type B2N = (Boolean01 -> Number)
We might expect a rule as follows. Because Boolean01 <: Number (in our imaginary system), a (Boolean01 -> Boolean01) function is a subtype of a (Number -> Number) function. This is a natural conclusion to arrive at...but wrong, as we will soon see.
fun p(op :: (A -> B)) -> B: op(a-value) end
Stop and try to fill out this table first.
| N2N |
| N2B |
| B2N |
| B2B | |
n2n |
| yes (identical) |
| no (range) |
| yes (domain) |
| no (range) |
n2b |
| yes (range) |
| yes (identical) |
| yes (domain and range) |
| yes (domain) |
b2n |
| no (domain) |
| no (domain and range) |
| yes (identical) |
| no (range) |
b2b |
| no (domain) |
| no (domain) |
| yes (range) |
| yes (identical) |
In each cell, “yes” means the function on the left can be passed in when the type at the top is expected, while “no” means it cannot. Parentheses give the reason: “identical” means they are the same type (so of course they can be passed in); in the “yes” case it says where subtyping needed to apply, while in the “no” case where the type error is.
Let us consider trying to pass n2n to a N2B annotation (for op). Because the return type of p is Boolean01, whatever uses p(n2n) assumes that it gets only Boolean01 values back. However, the function n2n is free to return any numeric value it wants: in particular, given 1 it returns 2, which does not correspond to either Boolean01. Therefore, allowing this parameter can result in an unsound program execution. To prevent that, we must flag this as a type error.
More generally, if the type of the emph formal parameter promises Boolean01, the actual function passed had better return only Boolean01; but if the type of the formal is Number, the actual can safely return Boolean01 without causing trouble. Thus, in general, for (A -> B) <: (C -> D), we must have that B <: D. In other words, the subtyping of the range parallels the subtyping of the function itself, so we say the range position is covariant (“co-” meaning “together”).
Now we get to the more interesting case: the domain. Consider why we can pass n2n where a B2N is expected. Inside the body of op, a-value can only be a Boolean01, because that is all the type permits. Because every Boolean01 is a Number, the function n2n has no trouble accepting it.
In contrast, consider passing b2n where an N2N is expected. Inside op, a-value can evaluate to any number, because op is expected (by the type annotation on p) to be able to accept it. However, b2n can accept only two numbers; everything else results in an error. Hence, if the type-checker were to allow this, we could get a run-time error even though the program passed the type-checker.
From this, the moral we derive is that for the domain position, the formal must be a subtype of the actual. The formal parameter bounds what values op can expect; so long as the actual can take a set of values at least as large, there will be no problem. Thus, for (A -> B) <: (C -> D), we must have that C <: A. The subtyping of the domain goes in the direction opposite to that of the subtyping of the function itself, so we say the range position is contravariant (“contra-” meaning “opposite”).
Putting together these two rules, (A -> B) <: (C -> D) when C <: A and B <: D.
24.6.1.2 Subtyping and Information Hiding
o :: Add1Sub1 = ...
In a strictly dynamic interpretation—
crypto = { private-key: ... , public-key: ..., decrypt: fun(msg): ... end, encrypt: fun(plain-text): ... end }
type PK = { public-key: Number, encrypt: (String -> String) }
for-dist :: PK = crypto
fun proxy-for-crypto(c): { public-key: c.public-key, encrypt: c.encrypt } end proxy-dist = proxy-for-crypto(for-dist)
24.6.1.3 Implementing Subtyping
Until now all of our type rules have been syntax-driven, which is what enabled us to write a recursive-descent type-checker. Now, however, we have a rule that applies to all expressions, so we can no longer be sure when to apply it.
There could be many levels of subtyping. As a result, it is no longer obvious when to “stop” subtyping. In particular, whereas before type-checking was able to calculate the type of an expression, now we have many possible types for each expression; if we return the “wrong” one, we might get a type error (due to that not being the type expected by the context) even though there exists some other type that was the one expected by the context.
24.6.2 Types for Self-Reference
Remember that one of the essential features of many object systems is having a reference, inside a method, to the object on which it was invoked: i.e., a self-reference [Objects with Self-Reference]. What is the type of this self identifier?
Consider the type Add1Sub1 we described earlier. To be entirely
honest, the implementation of add1 and sub1—
You see where this is going.
type Add1Sub1 = μ T . { add1 :: (T, Number -> Number), sub1 :: (T, Number -> Number) }
Unfortunately, recursive types are not as simple as they look. Note that the above type does not have a “base case”; thus, it is a finite representation of an infinite type (which is exactly what we want, because we can write an infinite number of self applications). Therefore, when it comes to checking for the equality of two recursive types, we encounter complications, which are beyond the scope of this study.See Pierce’s Types and Programming Languages for details.
24.6.3 Nominal Types
Earlier [Object Classification and Object Equality] we read about nominal equality,
where classes are made to aid in equality comparisons. In some typed
languages—
The basic idea is that each class (or other nominal entity) defines an entirely new type, even if the type-structure of its members is exactly the same as that of some other type. Then, type equality mirrors nominal equality, but trivially: if two values have the same type they must have the same structure, and if they have different types then their structure doesn’t matter (even if it’s identical). Thus, type equality reduces to a constant-time check of whether the classes are the same.
class Add1Sub1 { |
public int add1(int n) { ... } |
public int sub1(int n) { ... } |
} |
It is worth noting that in Java, inheritance (unfortunately) corresponds to subtyping. As we go up the inheritance chain a class has fewer and fewer members (width subtyping), until we reach Object, the supertype of all classes, which has the fewest. Thus for all class types C in Java, C <: Object.Somewhat confusingly, the terms narrowing and widening are sometimes used, but with what some might consider the opposite meaning. To widen is to go from subtype to supertype, because it goes from a “narrower” (smaller) to a “wider” (bigger) set. These terms evolved independently, but unfortunately not consistently. The interpretation of subtyping as subsets holds: every object that has a type lower in an inheritance hierarchy also has a type higher in the hierarchy, but not vice versa. When it comes to depth subtyping, however, Java prefers types to be invariant down the object hierarchy because this is a safe option for conventional mutation.
25 Control Operations
The term control refers to any programming language instruction that causes evaluation to proceed, because it “controls” the program counter of the machine. In that sense, sequential execution of instructions is “control”, as is even an arithmetic expression (and in the presence of state, this control is laid bare through the order in which effects occur); other forms of control found in all ordinary programming languages include function calls and returns. However, in practice we use the term to refer primarily to those operations that cause non-local transfer of control beyond that of mere functions and procedures, usually starting with exceptions. We will study such operations in this chapter.
As we study the following control operators, it’s worth remembering that even without them, we still have languages that are Turing-complete, so these control operations provide no more “power”. Therefore, what control operators do is change and potentially improve the way we express our intent, and therefore enhance the structure of programs. Thus, it pays to being our study by focusing on program structure.
25.1 Control on the Web
print(read-number("First number") + read-number("Second number"))
Now suppose we want to run this on a Web server. We immediately
encounter a difficulty: the structure of server-side Web programs is
such that they generate a single Web page—
Why do Web servers behave in such a strange way?
There are at least two reasons for this behavior: one perhaps historical, and the other technical. The historical reason is that Web servers were initially designed to serve pages, i.e., static content. Any program that ran had to generate its output to a file, from which a server could offer it. Naturally, developers wondered why that same program couldn’t run on demand. This made Web content dynamic. Terminating the program after generating a single piece of output was the simplest incremental step in transitioning the Web from “pages” to “programs”.
The more important reason—
Conceptually, therefore, the Web protocol was designed to be stateless: it would not store state on the server associated with intermediate computations. Instead, Web program developers would be forced to maintain all necessary state elsewhere, and each request would need to be able to resume the computation in full. In practice the Web has not proven to be stateless, but it still hews in this direction, and studying the structure of such programs is very instructive.
Now consider client-side Web programs: those that run inside the browser, written in or compiled to JavaScript. Suppose such a computation needs to communicate with a server. The primitive for this is called XMLHttpRequest. The user makes an instance of this primitive and invokes its send method to send a message to the server.
Communicating with a server is not, however, instantaneous: it takes some time; if the server faces a heavy load, it could take a long time; and indeed, it may never complete at all, depending on the state of the network and the server. (These are the same problems faced above by get-number, with a user taking the place of a server: the user may take a long time to enter a number, or may never do so at all.) If the send method suspended program execution, the entire (client-side) application would be blocked, indefinitely. You would not want to use such a program.
To keep the application responsive, the designers of XMLHttpRequest therefore had a choice. They could make JavaScript multi-threaded, but because the language also has state, programmers would have to confront all the problems of combining state with concurrency. In particular, beginners would have to wrestle with a combination of features that even experienced programmers do not use well, probably resulting in numerous deadlocked Web sites.
Instead, JavaScript is single-threaded: i.e., there is only one thread of execution at a time.Due to the structuring problems this causes, there are now various proposals to, in effect, add “safe” threads to JavaScript. The ideas described in this chapter can be viewed as an alternative that offer similar structuring benefits. When the send method is invoked, JavaScript instead suspends the current computation and returns control to an event loop, which can now invoke other suspended computations. Devlopers associate a callback with the send. When (and if) a response returns, this callback is added to the queue of suspended computations, thereby enabling it to resume.
This callback needs to embody the rest of the processing of
that request. Thus, for entirely different reasons—
25.1.1 Program Decomposition into Now and Later
read-number("First number")
print(<the result from the first interaction> + read-number("Second number"))
It needs to be a syntactically valid program.
It needs to stay suspended until the request comes in.
It needs a way—
such as a parameter— to refer to the value from the first interaction.
fun(v1): print(v1 + read-number("Second number")) end
25.1.2 A Partial Solution
On the Web, there is an additional wrinkle: each Web page with input elements needs to refer to a program stored on the Web, which will receive the data from the form and process it. This program is named in the action field of a form. Thus, imagine that the server generates a fresh label, stores the above function in a table associated with that label, and refers to the label in the action field. When (and if) the client actually submits the form the server extracts the associated function, supplies it with the form’s values, and thus resumes execution.
Is the solution above stateless?
read-number-suspend("First number", fun(v1): print(v1 + read-number("Second number")) end)
fun(v2): print(v1 + v2) end
read-number-suspend("First number", fun(v1): read-number-suspend("Second number", fun(v2): print(v1 + v2) end) end)
Ascribe types to the above computation. Also determine the type of the Web server and of the table holding these procedures.
25.1.3 Achieving Statelessness
We haven’t actually achieved statelessness yet, because we have this large table residing on the server, with no clear means to remove entries from it. It would be better if we could avoid the server state entirely. This means we have to move the relevant state to the client.
There are actually two ways in which the server holds state. One is
that we have reserved the right to create as many entries in the hash
table as we wish. This makes the server storage space proportional to
the number of interactions—
read-number-stateless("First number", prog-1) fun prog-1(v1): read-number-stateless("Second number", prog-2) end fun prog-2(v2): print(v1 + v2) end
The way to fix this problem is, instead of creating a closure after one step, to send v1 to the client to be stored there. Where do we store this? The browser offers two mechanisms for doing this: cookies and hidden fields. Which one do we use?
25.1.4 Interaction with State
var cookie = "dummy initial value" read-number-suspend("First number", fun(v1): cookie := v1 read-number-suspend("Second number", fun(v2): print(cookie + v2) end) end)
var cookie = "dummy initial value" read-number-stateless("First number", prog-1) fun prog-1(v1): cookie := v1 read-number-stateless("Second number", prog-2) end fun prog-2(v2): print(cookie + v2) end
Unfortunately, this means every intermediate computation will share the same cookie variable. If we open up two concurrent windows and try to add different first numbers, the latest first number will always reside in cookie, so the other window is going to see unpredictable results.
This, of course, is precisely what happens on the Web.These problems are not hypothetical. For instance, see Section 2 of Modeling Web Interactions and Errors. The browser’s cookies are merely a client-side implementation of the store. Thus, Web sites that store their information in cookies are susceptible to exactly this problem: two concurrent interactions with the site will end up interfering with one another. Therefore, the pervasive use of cookies on Web sites, induced by Web programming traditions, results in actively less usable sites.
In contrast, the Web offers another mechanism for storing information
on the client: the hidden field. Because they are local to each
page, and each page corresponds to a closure, they are precisely
analogous to a closure’s environment! Thus, instead of storing the
value of v1 in a single, global cookie, if we were to store it
in a hidden field in the response page, then two different response
pages would have different values in their hidden field, which would
be sent back to the server on the next request—
25.2 Conversion to Continuation-Passing Style
The style of functions we’ve been writing has a name. Though we’ve
presented ideas in terms of the Web, we’re relying on a much older
idea: the functions are called continuations, and this style of
programs is called continuation-passing style
(CPS).We will take the liberty of using CPS as both a
noun and verb: a particular structure of code and the process that
converts code into it. This is worth studying in its own right,
because it is the basis for studying a variety of other non-trivial
control operations—
Earlier, we converted programs so that no Web input operation was nested inside another. The motivation was simple: when the program terminates, all nested computations are lost. A similar argument applies, in a more local sense, in the case of XMLHttpRequest: any computation depending on the result of a response from a Web server needs to reside in the callback associated with the request to the server.
In fact, we don’t need to transform every expression. We only care about expressions that involve actual Web interaction. For example, if we computed a more complex mathematical expression than just addition, we wouldn’t need to transform it. If, however, we had a function call, we’d either have to be absolutely certain the function didn’t have any Web invocations either inside it, or in the functions in invokes, or the ones they invoke...or else, to be defensive, we should transform them all. Therefore, we have to transform every expression that we can’t be sure performs no Web interactions.
The heart of our transformation is therefore to turn every function, f, into one with an extra argument. This extra argument is the continuation, which represents the rest of the computation. f, instead of returning a value, instead passes the value it would have returned to its continuation. Thus, the continuation is itself a function of one argument; this argument represents the value that would have been returned by f. A function returns a value to “pass it to the rest of the computation”; CPS makes this explicit, because invoking a continuation (in place of returning a value) precisely passes it to the function representing the rest of the computation.
CPS is a general transformation, which we can apply to any program. Because it’s a program transformation, we can think of it as a special kind of desugaring that transforms programs within the same language: from the full language to a more restricted version that obeys the pattern we’ve been discussing. As a result, we can reuse an evaluator for the full language to also evaluate programs in the CPS subset.
25.2.1 Implementation by Desugaring
Let us therefore implement CPS as a source-to-source transformation. Thought of as a function, it consumes and returns ExprC expressions, but the output expressions will have the peculiar structure we have seen above, and will therefore be a strict subset of all ExprC expressions.
Put differently, the comment about “strict subset” above means that certain ExprC expressions are not legal in the output that CPS generates. Provide examples.
fun cps(e :: ExprC) -> ExprC: |
cases (ExprC) e: |
end |
end |
Our representation in CPS will be to turn every expression
into a procedure of one argument, the continuation. The converted
expression will eventually either supply a value to the continuation
or will pass the continuation on to some other expression that
will—
| numC(_) => fdC("k", appC(idC("k"), e)) |
| idC(_) => fdC("k", appC(idC("k"), e)) |
Extend the language to handle conditionals.
Extend the language to support mutable state as well. Does this have any impact on the CPS process, i.e., does it change the pattern of conversion?
| plusC(l, r) => |
fdC("k", |
appC(cps(l), |
fdC("l-v", |
appC(cps(r), |
fdC("r-v", |
appC(idC("k"), plusC(idC("l-v"), idC("r-v")))))))) |
Finally, we have function definition and application.
It’s tempting to think that, because function are just values, they too can be passed unchanged to the continuation. Why is this not true?
Before proceeding, alter the underlying language to also permit two-argument function definitions and, correspondingly, applications. Name the definitions fd2C and the applications app2C.
| appC(f, a) => |
fdC("k", |
appC(cps(f), |
fdC("f-v", |
appC(cps(a), |
fdC("a-v", |
appC(idC("k"), appC(idC("f-v"), idC("a-v")))))))) |
Do you see why this is wrong?
| appC(f, a) => |
fdC("k", |
appC(cps(f), |
fdC("f-v", |
appC(cps(a), |
fdC("a-v", |
app2C(idC("f-v"), idC("a-v"), idC("k"))))))) |
A function is itself a value, so it should be returned to the pending computation. The application case above, however, shows that we have to transform functions to take an extra argument, namely the continuation at the point of invocation. This leaves us with a quandary: which continuation do we supply to the body?
| fdC(v, b) => |
fdC("k", |
appC(idC("k"), |
fd2C(v, "dyn-k", |
appC(cps(b), ???)))) |
That is, in place of ???, which continuation do we supply: k or dyn-k?
Which continuation should we supply?
The former is the continuation at the point of closure creation. The latter is the continuation at the point of closure invocation. In other words, the former is “static” and the latter is “dynamic”. In this case, we need to use the dynamic continuation, otherwise something very strange would happen: the program would return to the point where the closure was created, rather than where it is being used! This would result in seemingly very strange program behavior, so we wish to avoid it. Observe that we are consciously choosing the dynamic continuation just as, where scope was concerned we chose the static environment but where state was concerned we chose the “dynamic” (namely, most recent) store. Thus continuations are more like state than they are like lexical binding, a similarity we will return to later [REF].
| fdC(v, b) => |
fdC("k", |
appC(idC("k"), |
fd2C(v, "dyn-k", |
appC(cps(b), idC("dyn-k"))))) |
After you have understood this material, replace "dyn-k" with "k", predict what should change, and check that it does.
fun icps(e): id-cps = fdC("v", idC("v")) interp(appC(cps(e), id-cps), mt-env) end
icps(plusC(numC(5), appC(quad, numC(3)))) is numV(17) icps(multC(appC(c5, numC(3)), numC(4))) is numV(20) icps(plusC(numC(10), appC(c5, numC(10)))) is numV(15)
25.2.2 Understanding the Output
cps(plusC(numC(1), numC(2)))
fdC("k", appC(fdC("k", appC(idC("k"), numC(1))), fdC("l-v", appC(fdC("k", appC(idC("k"), numC(2))), fdC("r-v", appC(idC("k"), plusC(idC("l-v"), idC("r-v"))))))))
f1 = lam(k): (lam(shadow k): k(1) end)(lam(l-v): (lam(shadow k): k(2) end)(lam(r-v): k(l-v + r-v) end) end) end
check: f1(lam(x): x end) is 3 end
f2 = lam(k): (lam(k1): k1(1) end)(lam(l-v): (lam(k2): k2(2) end)(lam(r-v): k(l-v + r-v) end) end) end check: f2(lam(x): x end) is 3 end
(lam(k1): k1(1) end)(...)
There is an active line of research in creating better CPS transformations that produce fewer intermediate function terms; we’ve actually used one of the very oldest and least sophisticated. The trade-off is in simplicity of desugaring versus simplicity of output, with the two roughly inversely correlated.
25.2.3 An Interaction Primitive by Transformation
At this point we have identified a problem in program structure; we hypothesized a better API for it; we transformed an example to use such an API; and then we generalized that transformation. But now we have a program structure so complex that it is unclear what use it could possibly be. The point of this transformation was so that every sub-expression would have an associated continuation, which a interaction-friendly primitive can use. Let’s see how to do that.
| read-numC(p :: ExprC) | read-num-webC(p :: ExprC, k :: ExprC)
We will assume that cps does not need to handle read-num-webC (because the end-user is not expected to write this directly), while interp does not need to handle read-numC (because we want this interpreter to function even in a setting that periodically terminates input, so it cannot block waiting for a response).
| read-numC(p) => fdC("k", appC(cps(p), fdC("p-v", read-num-webC(idC("p-v"), idC("k")))))
Now let us build an implementation of read-num-webC in the interpreter that properly simulates a program that halts.
var web-continuation = "nothing here yet"
| read-num-webC(p, k) => prompt = num-to-string(interp(p, nv).n) cont = interp(k, nv) print('Web interaction: ' + prompt) web-continuation := cont raise('Program halted waiting for user input')
Introduce an error in cps and show how halting the program highlights it, while not doing so silently masks it.
icps(plusC(read-numC(numC(1)), read-numC(numC(2))))
Web interaction: 1 |
Error: |
|
"Program halted waiting for user input" |
At this point, web-continuation contains a genuine, run-time closure (a closV value). This represents a continuation: a program value representing the rest of the computation.Due to a bug in the current implementation, you can’t inspect the value of web-continuation directly; but you can access it from a function that closes over it. The user now supplies an input in the imagined Web form; this is provided as the actual argument to the continuation.
fun run-wc(n): wc = web-continuation interp(appC(wc.f, numC(n)), wc.e) end
> run-wc(3) |
Web interaction: 2 |
Error: |
|
"Program halted waiting for user input" |
> run-wc(4) |
numV(7) |
Here, then, is the key lesson. By transforming the program into CPS
we were able to write a normal-looking
program—
Modify the program to store each previous continuations with some kind of unique tag. Now that you have access to multiple continuations, simulate the effect of different browser actions such as reloading the page (re-invoking a continuation), going back (using a prior continuation), cloning a page (re-using a continuation), etc. Does your implementation still work?
25.3 Implementation in the Core
Now that we’ve seen how CPS can be implemented through desguaring, we should ask whether it can be put in the core instead.
Recall that we’ve said that CPS applies to all programs. We have one program we are especially interested in: the interpreter. Sure enough, we can apply the CPS transformation to it, making available what are effectively the same continuations.
25.3.1 Converting the Interpreter
fun interp(e :: ExprC, nv :: List<Binding>, k): |
cases (ExprC) e: |
end |
end |
Note that we have not annotated k, and we’ve dropped the return annotation on interp. Fill them in.
| numC(n) => |
k(numV(n)) |
| idC(s) => |
k(lookup(s, nv)) |
| plusC(l, r) => |
interp(l, nv, |
lam(l-v): |
interp(r, nv, |
lam(r-v): |
k(plus-v(l-v, r-v)) |
end) |
end) |
| fdC(_, _) => |
k(closV(e, nv)) |
| fd2C(_, _, _) => |
k(closV(e, nv)) |
| appC(f, a) => |
interp(f, nv, |
lam(clos-v): |
interp(a, nv, |
lam(arg-v): |
interp(clos-v.f.body, |
xtnd-env(bind(clos-v.f.arg, arg-v), clos-v.e), |
k) |
end) |
end) |
| app2C(f, a1, a2) => |
interp(f, nv, |
lam(clos-v): |
interp(a1, nv, |
lam(arg1-v): |
interp(a2, nv, |
lam(arg2-v): |
interp(clos-v.f.body, |
xtnd-env(bind(clos-v.f.arg1, arg1-v), |
xtnd-env(bind(clos-v.f.arg2, arg2-v), |
clos-v.e)), |
k) |
end) |
end) |
end) |
By converting the interpreter to CPS we have given it access to an extra parameter: k, the continuation of the interpreter. Because the interpreter’s execution mimics the intended behavior of the interpreted program, the continuation of the interpreter reflects the rest of the behavior of the interpreted program: i.e., applying interp to an expression e with continuation k will result in k being given the value of e. We can therefore put k to work by exposing it to programs being interpreted.
25.3.2 An Interaction Primitive in the Core
| read-numC(p) => interp(p, nv, lam(p-v): prompt = num-to-string(p-v.n) print('Web interaction: ' + prompt) web-continuation := k raise('Program halted waiting for user input') end)
fun run-wc(n): web-continuation(numC(n)) end
plusC(read-numC(numC(1)), read-numC(numC(2)))
Web interaction: 1 |
Error: |
|
"Program halted waiting for user input" |
> run-wc(3) |
Web interaction: 2 |
Error: |
|
"Program halted waiting for user input" |
> run-wc(4) |
numV(7) |
When using CPS, the hard work was actually done in the program transformation. The interpreter as a whole was essentially unchanged from before; indeed, the main addition to the interpreter was effectively debugging support in the form of halting its execution, so we could make sure the continuation strategy was correct. Here, the transformation is of the interpreter itself, done one time, and the interpreter works to generate the continuations.
In particular, the continuation now closes over the rest of the behavior, not of the interpreted program but the interpreting one. Because the latter’s job, however, is to precisely mimic that of the former, we cannot observe this difference.
25.4 Generators
Many programming languages now have a notion of generators. A generator is like a procedure, in that one can invoke it in an application. Whereas a regular procedure always begins execution at the beginning, a generator resumes from where it last left off. Of course, that means a generator needs a notion of “exiting before it’s done”. This is known as yielding, namely returning control to whatever called it.
In some languages a generator is an object that is instantiated like any other object, and its execution is resumed by invoking a method (such as next in Python). In others it is just like a procedure, and indeed it is re-entered by applying it like a function.In languages where values in addition to regular procedures can be used in an application, all such values are collectively called applicables.
In some languages the yielding operation—
such as Python’s yield— is available only inside the syntactic body of the generator. In others, such as Racket, yield is an applicable value bound in the body, but by virtue of being a value, it can be passed to abstractions, stored in data structures, and so on.
(generator (yield) (from) |
(rec (f (lambda (n) |
(begin |
(yield n) |
(f (+ n 1))))) |
(f from))) |
(generator (y) (from) |
(rec (f (lambda (n) |
(begin |
(y n) |
(f (+ n 1))))) |
(f from))) |
(generator (y) (from) |
(rec (f (lam (n) |
(seq |
((yield-helper y) n) |
(f (+ n 1))))) |
(f from))) |
Is yield a statement or expression? In many languages it is actually an expression, meaning it has a value: the one supplied when resuming the generator. This makes the generator more flexible because the user of a generator can use the parameter(s) to alter the generator’s behavior, rather than being forced to use state to communicate desired changes.
What happens at the end of the generator’s execution? In many languages, a generator raises an exception to signal its completion.
remember where in its execution it currently is, and
know where in its caller it should return to.
remember where in its execution its caller currently is, and
know where in its body it should return to.
As you might guess, these “where”s correspond to continuations.
Add generators to a CPS interpreter.
How do generators differ from coroutines and threads? Implement coroutines and threads using a similar strategy.
We have seen that Python’s generators do not permit any abstraction over yielding, whereas Racket’s do. Assuming this was intentional, why might Python have made such a design decision?
25.5 Continuations and Stacks
It’s a record of what remains to be done in the computation. So is the continuation.
It’s traditionally thought of as a list of stack frames. That is, each frame has a reference to the frames remaining after it finishes. Similarly, each continuation is a small procedure that refers to—
and hence closes over— its own continuation. If we had chosen a different representation for program instructions, combining this with the data structure representation of closures, we would obtain a continuation representation that is essentially the same as the machine stack. Each stack frame also stores procedure parameters. This is implicitly managed by the procedural representation of continuations, whereas this was done explicitly in the data stucture representation (using bind).
Each frame also has space for “local variables”. In principle so does the continuation, though by desugaring local binding, we’ve effectively reduced everything to procedure parameters. Conceptually, however, some of these are “true” procedure parameters while others are local bindings turned into procedure parameters by desugaring.
The stack has references to, but does not close over, the heap. Thus changes to the heap are visible across stack frames. In precisely the same way, closures refer to, but do not close over, the store, so changes to the store are visible across closures.
Let’s use k to refer to the stack present before the function application begins to evaluate.
When we begin to evaluate the function position (f), create a new stack frame (fdC("f-v"): ...;. This frame has one free identifier: k. Thus its closure needs to record one element of the environment, namely the rest of the stack.
The code portion of the stack frame represents what is left to be done once we obtain a value for the function: evaluate the argument, and perform the application, and return the result to the stack expecting the result of the application: k.
When evaluation of f completes, we begin to evaluate a, which also creates a stack frame: fdC("a-v"): ...;. This frame has two free identifiers: k and f-v. This tells us:
We no longer need the stack frame for evaluating the function position, but
we now need a temporary that records the value—
hopefully a function value— of evaluating the function position.
The code portion of this second frame also represents what is left to be done: invoke the function value with the argument, in the stack expecting the value of the application.
Similarly, examining the CPS conversion of conditionals would tell us that we have to create a new frame to evaluate the conditional expression we have to create a new stack frame. This frame closes over the stack expecting the value of the entire conditional. This frame makes a decision based on the value of the conditional expression, and invokes one of the other expressions. Once we have examined this value the frame created to evaluate the conditional expression is no longer necessary, so evaluation can proceed in the original continuation.
Viewed through this lens, we can more easily provide an operational explanation for generators. Each generator has its own private stack, and when execution attempts to return past its end, our implementation raises an error. On invocation, a generator stores a reference to the stack of the “rest of the program”, and resumes its own stack. On yielding, the system swaps references to stacks. Coroutines, threads, and generators are all conceptually similar: they are all mechanisms to create “many little stacks” instead of having a single, global stack.
25.6 Tail Calls
Observe that the stack patterns above add a frame to the current stack, perform some evaluation, and eventually always return to the current stack. In particular, observe that in an application, we need stack space to evaluate the function position and then the arguments, but once all these are evaluated, we resume computation using the stack we started out with before the application. In other words, function calls do not themselves need to consume stack space: we only need space to compute the arguments.
However, not all languages observe or respect this property. In languages that do, programmers can use recursion to obtain iterative behavior: i.e., a sequence of function calls can consume no more stack space than no function calls at all. This removes the need to create special looping constructs; indeed, loops can simply be expressed as a syntactic sugar.
Of course, this property does not apply in general. If a call to
f is performed to compute an argument to a call to g,
the call to f is still consuming space relative to the context
surrounding g. Thus, we should really speak of a relationship
between expressions: one expression is in tail position relative
to another if its evaluation requires no additional stack space beyond
the other. In our CPS desugaring, every expression that uses k as
its continuation—
Some languages have special support for tail recursion: when a procedure calls itself in tail position relative to its body. This is obviously useful, because it enables recursion to efficiently implement loops. However, it hurts “loops” that cannot be squeezed into a single recursive function. For instance, when implementing a scanner or other state machine, it is most convenient to have a set of functions each representing one state, and transitioning to other states by making (tail) function calls. It is onerous (and misses the point) to turn these into a single recursive function. If, however, a language recognizes tail calls as such, it can optimize these cross-function calls just as much as it does intra-function ones.
Scheme and Racket, in particular, promise to implement tail calls without allocating additional stack space. Though some people refer to this as “tail call optimization”, this term is misleading: an optimization is optional, whereas whether or not a language promises to properly implement tail calls is a semantic feature. Developers need to know how the language will behave because it affects how they program: they need to know how to structure their loops!
Because of this feature, observe something interesting about the program after CPS transformation: all of its function applications are themselves tail calls! Assuming the program might terminate at any call is tantamount to not using any stack space at all (because the stack would get wiped out).
Any program that consumes some amount of stack, when converted to CPS and run, suddenly consumes no stack space at all. Why?
As a corollary, does conversion to CPS reduce the overall memory footprint of the program?
Java’s native security model employs a mechanism called stack inspection (look it up if you aren’t familiar with it). What is the interaction between CPS and stack inspection? That is, if we were to CPS a program, would this affect its security behavior?
If not, why not?
If so, how, and what would you suggest doing to recover security assuming the CPS conversion was necessary?
26 Glossary
The bandwidth between two network nodes is the quantity of data that can be transferred in a unit of time between the nodes.
A cache is an instance of a ☛ space-time tradeoff: it trades space for time by using the space to avoid recomputing an answer. The act of using a cache is called caching. The word “cache” is often used loosely; I use it only for information that can be perfectly reconstructed even if it were lost: this enables a program that needs to reverse the trade—
i.e., use less space in return for more time— to do so safely, knowing it will lose no information and thus not sacrifice correctness.
Coinduction is a proof principle for mathematical structures that are equipped with methods of observation rather than of construction. Conversely, functions over inductive data take them apart; functions over coinductive data construct them. The classic tutorial on the topic will be useful to mathematically sophisticated readers.
An idempotent operator is one whose repeated application to any value in its domain yields the same result as a single application (note that this implies the range is a subset of the domain). Thus, a function \(f\) is idempotent if, for all \(x\) in its domain, \(f(f(x)) = f(x)\) (and by induction this holds for additional applications of \(f\)).
Invariants are assertions about programs that are intended to always be true (“in-vary-ant”—
never varying). For instance, a sorting routine may have as an invariant that the list it returns is sorted.
The latency between two network nodes is the time it takes for packets to get between the nodes.
A metasyntactic variable is one that lives outside the language, and ranges over a fragment of syntax. For instance, if I write “for expressions e1 and e2, the sum e1 + e2”, I do not mean the programmer literally wrote “e1” in the program; rather I am using e1 to refer to whatever the programmer might write on the left of the addition sign. Therefore, e1 is metasyntax.
At the machine level, a packed representation is one that ignores traditional alignment boundaries (in older or smaller machines, bytes; on most contemporary machines, words) to let multiple values fit inside or even spill over the boundary.
For instance, say we wish to store a vector of four values, each of which represents one of four options. A traditional representation would store one value per alignment boundary, thereby consuming four units of memory. A packed representation would recognize that each value requires two bits, and four of them can fit into eight bits, so a single byte can hold all four values. Suppose instead we wished to store four values representing five options each, therefore requiring three bits for each value. A byte- or word-aligned representation would not fundamentally change, but the packed representation would use two bytes to store the twelve bits, even permitting the third value’s three bytes to be split across a byte boundary.
Of course, packed representations have a cost. Extracting the values requires more careful and complex operations. Thus, they represent a classic ☛ space-time tradeoff: using more time to shrink space consumption. More subtly, packed representations can confound certain run-time systems that may have expected data to be aligned.
Parsing is, very broadly speaking, the act of converting content in one kind of structured input into content in another. The structures could be very similar, but usually they are quite different. Often, the input format is simple while the output format is expected to capture rich information about the content of the input. For instance, the input might be a linear sequence of chacters on an input stream, and the output might be expected to be a rich, tree-structured according to some datatype: most program and natural-language parsers are faced with this task.
Reduction is a relationship between a pair of situations—
problems, functions, data structures, etc.— where one is defined in terms of the other. A reduction R is a function from situations of the form P to ones of the form Q if, for every instance of P, R can construct an instance of Q such that it preserves the meaning of P. Note that the converse strictly does not need to hold.
Suppose you have an expensive computation that always produces the same answer for a given set of inputs. Once you have computed the answer once, you now have a choice: store the answer so that you can simply look it up when you need it again, or throw it away and re-compute it the next time. The former uses more space, but saves time; the latter uses less space, but consumes more time. This, at its heart, is the space-time tradeoff. Memoization [REF], using a ☛ cache, environments (From Substitution to Environments), etc. are all instances of it.
Type variables are identifiers in the type language that (usually) range over actual types.
A notation used to transmit data across, as opposed to within, a closed platform (such as a virtual machine). These are usually expected to be relatively simple because they must be implemented in many languages and on weak processes. They are also expected to be unambiguous to aid simple, fast, and correct parsing. Popular examples include XML [REF], JSON [REF], and s-expressions [REF].