Sorting

Beep and Boop have completed their shopping spree and now must return to the sun – but wait! On their way back to the spaceship, something catches their eye: a huge field of wheat. Boop can’t resist a good grain, so they take a little detour and speak to the farmer.

They find out that the farm has been having some trouble with its mills – they cannot properly separate the wheat from the chaff, meaning there is good wheat being thrown out, and bad chaff getting mixed in with the wheat! Indubitably, this is a job for Beep and Boop.

Part 1: The Test Suite Life of Beep and Boop

The wheat huller uses a function sort(lst :: List<Number>) -> List<Number> that takes in a list of numbers and outputs the same list, but arranged in ascending order. For example:

>>> sort([list: 4,3,5,1])
[list: 1,2,3,4]

Work with your partner to come up with a good set of examples on which to test sort. What are the different list contents that would seem important to test?

Discuss the following questions with your partner:

What are some inputs that sort would need to handle?
What might incorrect implementations get wrong?

What makes a good test suite?

One way to think of a test suite is as a classifier. Specifically, when given a program, a test suite classifies it as either correct or incorrect. A good test suite, therefore, correctly identifies correct programs as “correct” and incorrect programs as “incorrect”.

Based on that definition, we can use a program called Examplar to determine how good a suite of tests is. Examplar will run your tests against correct implementations (called “wheats”) and incorrect implementations (called “chaffs”) of sort. It will let you know how many of the “wheats” successfully passed your tests and how many of the “chaffs” were successfully caught.

Your goal is to write a comprehensive test suite (this doesn’t mean long!) that will cause wheats to pass and chaffs to fail. Chaffs are typically versions of the program that kind of work, but will break for certain cases – your job is to identify these potential weaknesses.

Click here to open Examplar (You will need to sign into your brown.edu account). Once you open it, you should see a regular Pyret definitions window on the left. In the definitions window is a check block with a test template. Fill in the template with a test of your own and press run. You should see something that looks like the following:

On the left is the number of wheats that your test suite accepted. If any wheats aren’t accepted, one of your tests is incorrect.

On the right is the number of chaffs that were rejected. A chaff is rejected if it fails one of the tests in your set (which it should!).

Write new tests until your test suite successfully catches all of the chaffs.

When writing tests for any of your functions, you should strive to cover all of your bases, which includes making sure that your program works on general cases, but also on edge cases. The point of writing chaff tests is to ensure that your function is not missing anything and will work on all possible inputs.

Woohoo! You, Beep, and Boop have successfully identified the bad pieces of the huller’s sorting algorithm. The farmer can now rectify these issues and get back to running his farm.

Part 2: List Sorting

The farmer fixed the sorting program, but it’s not fast enough! Boop eats a lot of bread, and the mills cannot keep up with Boop’s bread consumption.

You’ve thought about what kinds of inputs sort would need to handle, so now we’re going to look at some different ways to implement it.

What’s a sorting algorithm?
There are many ways to sort a list. Some are faster than others, whereas others work very well on certain kinds of inputs and very poorly on others.

A sorting algorithm is a series of steps that outline a particular way to sort a list. In this part of lab, we will look at two: selection sort and mergesort.

SELECTION SORT

If the input list is empty: return the empty list.
Otherwise:
- Find the smallest element in the list
  eg: [list: 3,6,1,8,9]
- Put it at the beginning of the list
  eg: [list: 1,3,6,8,9]
- Link it to the recursive call on the rest of the list

MERGE SORT

If the input list is empty: return the empty list
If the input list contains a single element: return a list containing just that element.
Otherwise:
- Split the list in half
- Run mergesort (recursively) on both halves
- Merge* the now-sorted halves
*Merge is a helper function that merges two sorted lists into a single sorted list.

This is an illustration of how the different components of the merge sort algorithm fit together recursively:

Working with your partner, try to convince each other why both algorithms are guaranteed to produce a sorted list. Call over a TA if you’re not sure.
Pull up this file, which provides code for both algorithms.
Look at the code for selection sort. We’ll use the number of comparisons (such as <) as our measure of the run time for this algorithm.
1. Think back to your collection of test inputs for sort. Are any likely to require more comparisons than the others to sort? Do any require fewer than the others?
2. Write a formal prediction of the worst case running time of selection-sort. You can express this as a broad class of function (“constant”, “linear”, “quadratic”, etc) or as an equation in terms of N, the list size.
3. The file we gave you loads a new library called efficiency-sort-grapher. Read the instructions at the bottom of the file to experiment with the runtimes of selection sort and mergesort.
Come up with a prediction for how many element comparisons get done in the worst case with selection sort
Now let’s write a prediction for the runtime of mergesort! For this algorithm, use the number of ‘links’ as the measure of runtime.
We have written several functions that allow you to test your predictions on lists of any arbitrary size: link to code. Use it to experiment with the runtimes of selection sort and mergesort!

Which sorting algorithm should the farmer use? In what case(s)?

CHECKPOINT: Call over a TA once you reach this point.

Part 3, Option 1: Working with Ancestor Trees

This option gives you a chance to understand the in-tree function from Wednesday’s class and to write another function (or two) on ancestor trees. If you were comfortable with trees, try the next set of questions on more list sorting instead. (those heading to 18 will eventually need do to the additional sorting questions, but feel free to focus on trees this week instead if you wish).

1. Tracing in-tree

This file contains the AncTree datatype and in-tree function from Wednesday’s lecture.

For each of the following expressions, try to write out the sequence of calls to in-tree that get made while evaluating the expression:

in-tree("Robert", annaT)
in-tree("Susan", annaT)

Open the tracer-version of Pyret. Copy and paste the code from the file into the definitions area. Press “Trace and Run”. Enter the following expression at the prompt:

in-tree("Robert", annaT)

then press “Trace Last”. Set the dropdown menu to read “Depth First”, then use the arrow keys to the right of the dropdown to see the function calls that get made in order. Does this match what you predicted?

Now run in-tree on a name that is not in the tree. Again use “Trace Last” to see the sequence of calls that get made. Does it match what you would have expected?

2. Counting Generations

We’d like to be able to compute how many generations are represented in the tree. In the case of annaT, there are 5 generations (from Anna up to Robert).

Your goal is to write a function called count-gens that takes an AncTree as input and produces a number indicating the number of generations in the tree.

Start by writing examples on small trees with no more than two generations. Your examples should have unknown people in various places. Can you connect the results of one tree to the next? For example, if you started from the tree for Ellen in our running example, how do you use the results on Laura and John’s trees to compute the number of generations upwards from Ellen?

Remember to break into cases on AncTree, and to call count-gens on each of the mother and father trees.

As you did for in-tree, copy your code into the Tracer version and use tracer to explore the order in which the people in the tree get processed.

3. Reflection

Talk through the following questions with your partner to see whether you are understanding trees:

Imagine that you are running in-tree and are at the spot for “Ellen”. The computation starts up the tree for Ellen’s mother. Which other people get checked before Ellen’s father? What about the code determines the order in which people are processed?
When we call in-tree on a name that is in the father’s side of the bottom-most person (such as Anna), the computation first explores all the way up the mother’s tree. When there are no more mothers, the mother tree is unknown. in-tree returns false on an unknown tree. Why doesn’t the entire computation just stop and return false at this point? What makes the computation continue after unknown is reached the first time?
A classmate doesn’t care for all of this recursion, and instead tries to write the following code for in-tree. Would this approach work? Why or why not? (the dot notation here to get components from values works fine–that aspect is not a problem). What does this tell you about recursion and when/why we use it?
```
fun in-tree(name :: String, anct :: AncTree) -> Boolean:
  (anct.name == name) or
  (anct.mother.name == name) or
  (anct.father.name == name) or
  (anct.mother.mother.name == name) or
  (anct.mother.father.name == name)
end
```
What questions do you still have about trees or recursion? Post them to Piazza to help us identify what still needs clarification.

4. In case you want more …

Can you write a function to count how many blue-eyed people are in the tree? How about a function to return a list of all the names in the tree? Feel free to try these if you want more practice.

Part 4: Bonus: More Sorting Algorithms

This section is intended as a supplement for students who are considering taking CS18 next semester. If you are not considering CS18, feel free to try it out anyway! If you are, this content will probably help with the transition.

Below are descriptions of two additional sorting algorithms: insertion sort and quicksort. Implement them and predict their runtimes.

INSERTION SORT
Insertion sort works by inserting elements into the correct spot one at a time.

If the input list is empty: return the empty list
If the input list contains a single element: return a list containing just that element
Run insertion-sort (recursively) on the rest of the list.
Insert the current first element into the correct place in the now-sorted rest of the list.

QUICKSORT
Quicksort works by choosing a single element to be a “pivot” and rearranging the rest of the list based on that pivot.

If the input list is empty: return the empty list
If the input list contains a single element: return a list containing just that element
Choose the first element of the list to be the “pivot”.
Split all of the elements in the rest of the list into two separate lists: one containing all elements less than the pivot, and one containing all elements greater than the pivot.
Run quicksort (recursively) on each of the new lists.
Join elements into a new list in the following order: first the one with elements less than the pivot, then the pivot, then the one with elements greater than the pivot.