On this page:
9.1 The Problem
9.2 The Examplar System
9.3 Wheats and Chaffs
9.4 Testing Goals

9 (Early) Testing for Programming

    9.1 The Problem

    9.2 The Examplar System

    9.3 Wheats and Chaffs

    9.4 Testing Goals

9.1 The Problem

Research shows that when students begin to work on a problem, they often misunderstand it. In fact, they often (unconsciously) switch to solving a problem they have solved before, especially if it sounds similar, rather than the one they are given. This not only causes frustration and wastes time, it reduces learning and can also hurt grades.

The How to Design Programs design recipe offers a redress: write examples before starting to write code. While in principle you can also have course staff look over your examples, this can be burdensome in practice. It’d be nice if you could have a TA available for instant consultation at any time, right where you are.

9.2 The Examplar System

We have built an experimental system for Pyret called Examplar [sic] that is a step in this direction. Examplar offers an interface that looks similar to that of code.pyret.org, with a Pyret editor on the left. This is intended to be a place for you to write your examples (using the syntax of test cases). These examples are then run against a suite of programs [Wheats and Chaffs], and Examplar reports on how they did.

For every assignment for which we have Examplar support, the assignment will contain an Examplar link.

9.3 Wheats and Chaffs

What does it mean to be a “good” set of tests? It seems tempting to say it must be “correct”, but that’s actually easy to achieve: the empty test suite is most certainly correct! But it isn’t of any use.

There’s a crisp way we can define a useful test suite. It stems from thinking of a test suite as a classifier:This is a term widely used in machine learning. given a program, it tells us whether the program is correct or faulty. A good classifier is one that correctly identifies correct programs as correct and faulty programs as faulty. (These can be quantified in terms of precision and recall.)

Therefore, your tests will be run against both correct and known incorrect implementations, which we call wheats and chaffs, respectively.The names are inspired by the English phrase “separating the wheat from the chaff”, and are thanks to John “Spike” Hughes. An ideal test suite will identify all the wheats as correct and label all the chaffs as incorrect. In practice, you will likely think through the problem incrementally, so you won’t catch all the chaffs up front: it will take a few tries to come up with enough tests to catch them all. However, any test that fails to pass a wheat is simply wrong and hence its chaff-detecting ability is irrelevant.

The output from Examplar will provide you information in terms of wheat and chaff counts.

9.4 Testing Goals

Your goal should be to make sure you pass the wheats and catch as many of the chaffs as possible. If you can’t catch every chaff, don’t fret too much. It may represent a behavior that you just haven’t thought of yet. When you feel you’ve learned about as much as you can from Examplar, move on to writing your solution code.

Keep in mind that Examplar is only meant to help with your initial exampleswhich help ensure you understand the problemas opposed to your final tests, which try to find errors in your program. At the problem level we ignore implementation details, but these can trip up your implementation. Therefore, your final test suite has to be more extensive than your initial examples, and we will run your program against several more chaffs than were provided in Examplar.