Sorting

Nancy Drew and Frank Hardy are sneaking around a wheat field to catch the infamous farm burglar. To get the scoop (and this bread), the detectives go to talk to the wheat farmer.

They find out that the farm has been having some trouble with its mills – they cannot properly separate the wheat from the chaff, meaning there is good wheat being thrown out, and bad chaff getting mixed in with the wheat! Indubitably, this is a job for Nancy and Frank.

Part 1: The Test Suite Life of Nancy and Frank

The wheat huller (the machine that separates the unusable chaff from the useful wheat) uses a function sort(lst :: List<Number>) -> List<Number> that takes in a list of numbers and outputs the same list, but arranged in ascending order. For example:

>>> sort([list: 4,3,5,1])
[list: 1,3,4,5]

Work with your partner to come up with a good set of examples on which to test sort. What are the different list contents that would seem important to test?

Discuss the following questions with your partner:

What are some inputs that sort would need to handle?
What might incorrect implementations get wrong?

What makes a good test suite?

One way to think of a test suite is as a classifier. Specifically, when given a program, a test suite classifies it as either correct or incorrect. A good test suite, therefore, correctly identifies correct programs as “correct” and incorrect programs as “incorrect”.

Based on that definition, we can use a program called Examplar to determine how good a suite of tests is. Examplar will run your tests against correct implementations (called “wheats”) and incorrect implementations (called “chaffs”) of sort. It will let you know how many of the “wheats” successfully passed your tests and how many of the “chaffs” were successfully caught.

Your goal is to write a comprehensive test suite (this doesn’t mean long!) that will cause wheats to pass and chaffs to fail. Chaffs are typically versions of the program that kind of work, but will break for certain cases – your job is to identify these potential weaknesses.

Click here to open Examplar (You will need to sign into your brown.edu account). Once you open it, you should see a regular Pyret definitions window on the left. In the definitions window is a check block with a test template. Fill in the template with a test of your own and press run. You should see something that looks like the following:

On the left side of the interactions window, you will see either VALID or INVALID, which indicates whether your test suite accepted the wheats (correct implementations). If any wheats aren’t accepted (and it says INVALID), one of your tests is incorrect.

On the right is the number of chaffs that were rejected. A chaff is rejected if it fails one of the tests in your set (which it should!).

Write new tests until your test suite successfully catches all of the chaffs.

When writing tests for any of your functions, you should strive to cover all of your bases, which includes making sure that your program works on general cases, but also on edge cases. The point of writing chaff tests is to ensure that your function is not missing anything and will work on all possible inputs.

Woohoo! You, Nancy, and Frank have successfully identified the bad pieces of the huller’s sorting algorithm. The farmer can now rectify these issues and get back to running his farm.

Part 2: List Sorting

The farmer fixed the sorting program, but it’s not fast enough! Frank eats a lot of bread, and the mills cannot keep up with his bread consumption.

You’ve thought about what kinds of inputs sort would need to handle, so now we’re going to look at some different ways to implement it.

What’s a sorting algorithm?
There are many ways to sort a list. Some are faster than others, whereas others work very well on certain kinds of inputs and very poorly on others.

A sorting algorithm is a series of steps that outline a particular way to sort a list. In this part of lab, we will look at two: selection sort and mergesort.

SELECTION SORT

If the input list is empty: return the empty list.
Otherwise:
- Find the smallest element in the list
  eg: [list: 3,6,1,8,9]
- Put it at the beginning of the list
  eg: [list: 1,3,6,8,9]
- Link it to the recursive call on the rest of the list

MERGE SORT

If the input list is empty: return the empty list
If the input list contains a single element: return a list containing just that element.
Otherwise:
- Split the list in half
- Run mergesort (recursively) on both halves
- Merge* the now-sorted halves
*Merge is a helper function that merges two sorted lists into a single sorted list.

This is an illustration of how the different components of the merge sort algorithm fit together recursively:

Working with your partner, try to convince each other why both algorithms are guaranteed to produce a sorted list. Call over a TA if you’re not sure.
Pull up this file, which provides code for both algorithms.
Look at the code for selection sort. We’ll use the number of comparisons (such as <) as our measure of the run time for this algorithm.
1. Think back to your collection of test inputs for sort. Are any likely to require more comparisons than the others to sort? Do any require fewer than the others?
2. Write a formal prediction of the worst case running time of selection-sort. You can express this as a broad class of function (“constant”, “linear”, “quadratic”, etc) or as an equation in terms of N, the list size.
3. The file we gave you loads a new library called efficiency-sort-grapher. Read the instructions at the bottom of the file to experiment with the runtimes of selection sort and mergesort.
Come up with a prediction for how many element comparisons get done in the worst case with selection sort
Now let’s write a prediction for the runtime of mergesort! For this algorithm, use the number of ‘links’ as the measure of runtime.
We have written several functions that allow you to test your predictions on lists of any arbitrary size: link to code. Use it to experiment with the runtimes of selection sort and mergesort!

Which sorting algorithm should the farmer use? In what case(s)?

CHECKPOINT: Call over a TA once you reach this point.

Part 3, Option 1: Working with Ancestor Trees

This option gives you a chance to understand the in-tree function from Wednesday’s class and to write another function (or two) on ancestor trees. If you were comfortable with trees, try the next set of questions on more list sorting instead. (those heading to cs18 will eventually need do to the additional sorting questions, but feel free to focus on trees this week instead if you wish).

1. Tracing in-tree

This file contains an AncTree datatype and in-tree function.

For each of the following expressions, try to write out the sequence of calls to in-tree that get made while evaluating the expression:

in-tree("Robert", annaT)
in-tree("Susan", annaT)

Open the tracer-version of Pyret. Copy and paste the code from the file into the definitions area. Press “Trace and Run”. Enter the following expression at the prompt:

in-tree("Robert", annaT)

then press “Trace Last”. Set the dropdown menu to read “Depth First”, then use the arrow keys to the right of the dropdown to see the function calls that get made in order. Does this match what you predicted?

Now run in-tree on a name that is not in the tree. Again use “Trace Last” to see the sequence of calls that get made. Does it match what you would have expected?

2. Counting Generations

We’d like to be able to compute how many generations are represented in the tree. In the case of annaT, there are 5 generations (from Anna up to Robert).

Your goal is to write a function called count-gens that takes an AncTree as input and produces a number indicating the number of generations in the tree.

Start by writing examples on small trees with no more than two generations. Your examples should have unknown people in various places. Can you connect the results of one tree to the next? For example, if you started from the tree for Ellen in our running example, how do you use the results on Laura and John’s trees to compute the number of generations upwards from Ellen?

Remember to break into cases on AncTree, and to call count-gens on each of the mother and father trees.

As you did for in-tree, copy your code into the Tracer version and use tracer to explore the order in which the people in the tree get processed.

3. Reflection

Talk through the following questions with your partner to see whether you are understanding trees:

Imagine that you are running in-tree and are at the spot for “Ellen”. The computation starts up the tree for Ellen’s mother. Which other people get checked before Ellen’s father? What about the code determines the order in which people are processed?
When we call in-tree on a name that is in the father’s side of the bottom-most person (such as Anna), the computation first explores all the way up the mother’s tree. When there are no more mothers, the mother tree is unknown. in-tree returns false on an unknown tree. Why doesn’t the entire computation just stop and return false at this point? What makes the computation continue after unknown is reached the first time?
A classmate doesn’t care for all of this recursion, and instead tries to write the following code for in-tree. Would this approach work? Why or why not? (the dot notation here to get components from values works fine–that aspect is not a problem). What does this tell you about recursion and when/why we use it?
```
fun in-tree(name :: String, anct :: AncTree) -> Boolean:
  (anct.name == name) or
  (anct.mother.name == name) or
  (anct.father.name == name) or
  (anct.mother.mother.name == name) or
  (anct.mother.father.name == name)
end
```
What questions do you still have about trees or recursion? Post them to Campuswire to help us identify what still needs clarification.

4. In case you want more …

Can you write a function to count how many blue-eyed people are in the tree? How about a function to return a list of all the names in the tree? Feel free to try these if you want more practice.

Part 3, Option 2: Working More with Lists

This is designed moreso for students who wish to take CS18 next semester

The mills are now working nice and quickly, but things are still not fast enough for Nancy! She suspects that this is because the process by which the grains are transported to the local bakers utilizes a slow, inefficient lookup function…

Look up on lists

Consider the function lookup(num :: Number, lst :: List<Number>) -> Number what, given a list of numbers, returns the index of a particular number in the list.

Predict its runtime. How long would it take in the best case? How about the worst case?
How do your answers change if the list is sorted or unsorted?
In the Pyret definitions window, write thelookup function.

CHECKPOINT: Call over a TA once you reach this point.

Look up on trees

Frank suggests using a tree to speed up the lookup function!

Here is a general tree datatype that makes a tree containing numbers rather than people:

data Tree:
    | leaf
    | node(value :: Number, left-child :: Tree, right-child :: Tree)
end

Every tree is either a leaf or a node with two children, where each child is also a tree.

We also talked about binary search trees, which are a special kind of tree where every node obeys the following two rules:

Its left subtree contains only values less than it
Its right subtree contains only values greater than it

Copy the data definition for a tree into the Pyret definitions window. Then, rewrite the following list as a binary search tree tree: [list: 7, 3, 5, 9, 10, 22, 17, 1]

A balanced binary search tree is a binary search tree that has as small a depth as possible, where the “depth” of a tree is the number of jumps between the root to the farthest leaf.

This is an unbalanced tree with a depth of 2:

This is a balanced tree with a depth of 1:

How balanced can you make the tree from before? Sketch a few versions on paper.
Predict lookup's runtime on a binary search tree. Is it different than your prediction for lookup on lists?
Now, write bst-lookup(num :: Number, tree :: Tree) -> Tree.
Because trees don’t have indices the way that lists do, the output of bst-lookup should be the actual node containing the value.
If you have time, write the function bst-insert(num :: Number, tree :: Tree) that takes in a number and a balanced binary search tree and inserts the number into the tree. Remember that your new tree must also be a binary search tree (and therefore must follow the rules outlined above).

Part 4: Bonus (aka Optional): More Sorting Algorithms

This section is intended as a supplement for students who are considering taking CS18 next semester. If you are not considering CS18, feel free to try it out anyway! If you are, this content will probably help with the transition.

Below are descriptions of two additional sorting algorithms: insertion sort and quicksort. Implement them and predict their runtimes.

INSERTION SORT
Insertion sort works by inserting elements into the correct spot one at a time.

If the input list is empty: return the empty list
If the input list contains a single element: return a list containing just that element
Run insertion-sort (recursively) on the rest of the list.
Insert the current first element into the correct place in the now-sorted rest of the list.

QUICKSORT
Quicksort works by choosing a single element to be a “pivot” and rearranging the rest of the list based on that pivot.

If the input list is empty: return the empty list
If the input list contains a single element: return a list containing just that element
Choose the first element of the list to be the “pivot”.
Split all of the elements in the rest of the list into two separate lists: one containing all elements less than the pivot, and one containing all elements greater than the pivot.
Run quicksort (recursively) on each of the new lists.
Join elements into a new list in the following order: first the one with elements less than the pivot, then the pivot, then the one with elements greater than the pivot.

A Happy Ending

Thanks for your help! Nancy and Frank are now full of good bread and are equipped to catch the farm burglar!