Beep and Boop have completed their shopping spree and now must return to the sun – but wait! On their way back to the spaceship, something catches their eye: a huge field of wheat. Boop can’t resist a good grain, so they take a little detour and speak to the farmer.
They find out that the farm has been having some trouble with its mills – they cannot properly separate the wheat from the chaff, meaning there is good wheat being thrown out, and bad chaff getting mixed in with the wheat! Indubitably, this is a job for Beep and Boop.
The wheat huller uses a function sort(lst :: List<Number>) -> List<Number>
that takes in a list of numbers and outputs the same list, but arranged in ascending order. For example:
>>> sort([list: 4,3,5,1])
[list: 1,2,3,4]
Work with your partner to come up with a good set of examples on which to test sort
. What are the different list contents that would seem important to test?
Discuss the following questions with your partner:
sort
would need to handle?What makes a good test suite?
One way to think of a test suite is as a classifier. Specifically, when given a program, a test suite classifies it as either correct or incorrect. A good test suite, therefore, correctly identifies correct programs as “correct” and incorrect programs as “incorrect”.
Based on that definition, we can use a program called Examplar to determine how good a suite of tests is. Examplar will run your tests against correct implementations (called “wheats”) and incorrect implementations (called “chaffs”) of sort
. It will let you know how many of the “wheats” successfully passed your tests and how many of the “chaffs” were successfully caught.
Your goal is to write a comprehensive test suite (this doesn’t mean long!) that will cause wheats to pass and chaffs to fail. Chaffs are typically versions of the program that kind of work, but will break for certain cases – your job is to identify these potential weaknesses.
Click here to open Examplar (You will need to sign into your brown.edu account). Once you open it, you should see a regular Pyret definitions window on the left. In the definitions window is a check block with a test template. Fill in the template with a test of your own and press run. You should see something that looks like the following:
On the left is the number of wheats that your test suite accepted. If any wheats aren’t accepted, one of your tests is incorrect.
On the right is the number of chaffs that were rejected. A chaff is rejected if it fails one of the tests in your set (which it should!).
When writing tests for any of your functions, you should strive to cover all of your bases, which includes making sure that your program works on general cases, but also on edge cases. The point of writing chaff tests is to ensure that your function is not missing anything and will work on all possible inputs.
Woohoo! You, Beep, and Boop have successfully identified the bad pieces of the huller’s sorting algorithm. The farmer can now rectify these issues and get back to running his farm.
The farmer fixed the sorting program, but it’s not fast enough! Boop eats a lot of bread, and the mills cannot keep up with Boop’s bread consumption.
You’ve thought about what kinds of inputs sort
would need to handle, so now we’re going to look at some different ways to implement it.
What’s a sorting algorithm?
There are many ways to sort a list. Some are faster than others, whereas others work very well on certain kinds of inputs and very poorly on others.
A sorting algorithm is a series of steps that outline a particular way to sort a list. In this part of lab, we will look at two: selection sort and mergesort.
If the input list is empty: return the empty list
If the input list contains a single element: return a list containing just that element.
Otherwise:
mergesort
(recursively) on both halves*Merge is a helper function that merges two sorted lists into a single sorted list.
This is an illustration of how the different components of the merge sort algorithm fit together recursively:
Working with your partner, try to convince each other why both algorithms are guaranteed to produce a sorted list. Call over a TA if you’re not sure.
Pull up this file, which provides code for both algorithms.
Look at the code for selection sort. We’ll use the number of comparisons (such as <) as our measure of the run time for this algorithm.
Think back to your collection of test inputs for sort. Are any likely to require more comparisons than the others to sort? Do any require fewer than the others?
Write a formal prediction of the worst case running time of selection-sort. You can express this as a broad class of function (“constant”, “linear”, “quadratic”, etc) or as an equation in terms of N, the list size.
The file we gave you loads a new library called efficiency-sort-grapher. Read the instructions at the bottom of the file to experiment with the runtimes of selection sort and mergesort.
Come up with a prediction for how many element comparisons get done in the worst case with selection sort
Now let’s write a prediction for the runtime of mergesort! For this algorithm, use the number of ‘links’ as the measure of runtime.
We have written several functions that allow you to test your predictions on lists of any arbitrary size: link to code. Use it to experiment with the runtimes of selection sort and mergesort!
Which sorting algorithm should the farmer use? In what case(s)?
This option gives you a chance to understand the in-tree function from Wednesday’s class and to write another function (or two) on ancestor trees. If you were comfortable with trees, try the next set of questions on more list sorting instead. (those heading to 18 will eventually need do to the additional sorting questions, but feel free to focus on trees this week instead if you wish).
This file contains the AncTree
datatype and in-tree
function from Wednesday’s lecture.
For each of the following expressions, try to write out the sequence of calls to in-tree
that get made while evaluating the expression:
in-tree("Robert", annaT)
in-tree("Susan", annaT)
Open the tracer-version of Pyret. Copy and paste the code from the file into the definitions area. Press “Trace and Run”. Enter the following expression at the prompt:
in-tree("Robert", annaT)
then press “Trace Last”. Set the dropdown menu to read “Depth First”, then use the arrow keys to the right of the dropdown to see the function calls that get made in order. Does this match what you predicted?
Now run in-tree
on a name that is not in the tree. Again use “Trace Last” to see the sequence of calls that get made. Does it match what you would have expected?
We’d like to be able to compute how many generations are represented in the tree. In the case of annaT
, there are 5 generations (from Anna up to Robert).
Your goal is to write a function called count-gens
that takes an AncTree
as input and produces a number indicating the number of generations in the tree.
Start by writing examples on small trees with no more than two generations. Your examples should have unknown
people in various places. Can you connect the results of one tree to the next? For example, if you started from the tree for Ellen in our running example, how do you use the results on Laura and John’s trees to compute the number of generations upwards from Ellen?
Remember to break into cases on AncTree
, and to call count-gens
on each of the mother and father trees.
As you did for in-tree
, copy your code into the Tracer version and use tracer to explore the order in which the people in the tree get processed.
Talk through the following questions with your partner to see whether you are understanding trees:
Imagine that you are running in-tree
and are at the spot for “Ellen”. The computation starts up the tree for Ellen’s mother. Which other people get checked before Ellen’s father? What about the code determines the order in which people are processed?
When we call in-tree
on a name that is in the father’s side of the bottom-most person (such as Anna), the computation first explores all the way up the mother’s tree. When there are no more mothers, the mother tree is unknown
. in-tree
returns false
on an unknown
tree. Why doesn’t the entire computation just stop and return false at this point? What makes the computation continue after unknown
is reached the first time?
A classmate doesn’t care for all of this recursion, and instead tries to write the following code for in-tree
. Would this approach work? Why or why not? (the dot notation here to get components from values works fine–that aspect is not a problem). What does this tell you about recursion and when/why we use it?
fun in-tree(name :: String, anct :: AncTree) -> Boolean:
(anct.name == name) or
(anct.mother.name == name) or
(anct.father.name == name) or
(anct.mother.mother.name == name) or
(anct.mother.father.name == name)
end
What questions do you still have about trees or recursion? Post them to Piazza to help us identify what still needs clarification.
Can you write a function to count how many blue-eyed people are in the tree? How about a function to return a list of all the names in the tree? Feel free to try these if you want more practice.
This section is intended as a supplement for students who are considering taking CS18 next semester. If you are not considering CS18, feel free to try it out anyway! If you are, this content will probably help with the transition.
Below are descriptions of two additional sorting algorithms: insertion sort and quicksort. Implement them and predict their runtimes.
INSERTION SORT
Insertion sort works by inserting elements into the correct spot one at a time.
insertion-sort
(recursively) on the rest of the list.QUICKSORT
Quicksort works by choosing a single element to be a “pivot” and rearranging the rest of the list based on that pivot.
quicksort
(recursively) on each of the new lists.