CS18 Bridge Assignment: Binary Search Trees

In class, we saw ancestor trees, in which we linked people to their biological parents. The datatype we had for this was:

data AncTree:
  | person(name :: String, 
      eye :: String, 
      mother :: AncTree,
      father :: AncTree )
  | unknown
end

Tree-shaped data (data in which multiple “next” pieces of data branch off a single datum) arises in various contexts in CS. Think of the layers of menus of options you select through when trying to call customer service: first you indicate the general nature of your call, then you get a more refined set of questions, and so on. Each question is a data point that branches off to other questions, each of which has their own set of questions, and so on.

Ancestor trees and phone-call menus are examples where our data is inherently tree-shaped: the relationships we want to capture branch off between data points. But in CS we also use trees in a more artificial manner: we impose a branching structure on data to allow us to work with that data more efficiently.

This assignment has you work with binary search trees (henceforth abbreviated as BST), a way of organizing a collection of data with order to help us find data items faster than searching in lists.

BSTs (on Numbers)

Imagine that you had a set of numbers such as

8, 5, 7, 1, 3, 12, 6, 10, 2, 4

and you wanted to organize it in a way that showed the ordering relationship among the numbers. You could make a sorted list, but as we have seen, checking whether a specific number is in a list can take time linear in the size of the list.

A binary search tree (BST) is a binary tree (meaning a tree with at most two branches from each node/point) in which every node has the following property:

Every element in the left subtree is smaller than 
the element at the root, every element in the right 
subtree is larger than the element at the root, and 
both the left and right subtrees are BSTs.

(this statement assumes no duplicates, but that is okay since we are modeling sets. If you want duplicates, change less-than to less-than-or-equal in the definition).

Here is one example BST on our sample set:

         10
       /    \
      4      12
    /   \
   2     7
 /  \   /  \
1   3  6    8
      /
     5

Practice (not for turn in): Draw two more BSTs on the same collection of numbers. Start with a different number at the top (called the root) and see what you come up with. Can you make distinctly different shapes?

Practice (not for turn in): Which of the following are BSTs on the numbers from 1 through 5?

      4             3            4          3
     / \           / \          / \       /   \
    5   3         2   5        2   5     1      5 
       / \       /   /        /           \    /
      1   2     1   4        1             2  4
                              \
                               3

A Datatype for BSTs

Generalizing from our ancestor tree, here’s a general datatype for BSTs on numbers:

data BST:
  | leaf
  | node(val :: Any, 
         left :: BST,
         right :: BST)
end 

This has the same shape as the AncTree definition, but it is not specific to relationships on people. Instead, it allows any kind of data (through the type Any). It also uses the term leaf, instead of unknown, for the empty tree.

Note: We can’t write this datatype in Python dataclasses (because of the recursive use of BST). You’ll therefore have to work these problems in Pyret.

Searching in BSTs

Imagine you were given a BST representing a set of numbers, and were asked whether a specific number is in the set. How would you proceed?

Practice (not for turn in): Using the same tree we provided, work through a couple of example searches, some of numbers in the tree and some not in the tree.

Exercise (to turn in): When you understand the algorithm, implement the function. Don’t forget to include test cases.

fun has-elt(num :: number, t :: BST) -> Boolean

Note: even though the definition of BST is more general than just numbers, you can assume we are working with trees of numbers when writing your code.

Creating BSTs

How do we build-up a BST, however? For lists, we know how to use link to add an element. How do we add an element to a BST?

Assume you are given a BST and a number. You add an element to the BST by looking through the tree for the leaf where the number should be, then putting the number in that spot. For example, assume you wanted to insert the number 4 into the following tree:

         8
       /   \
      5     9
    /   \
   3     7

We’d start at the root, and ask whether 4 is less than or greater than 8. Less than, so we should insert 4 on the left (so that we maintain the requirement on a BST). Now we compare 4 to 5. Less than, so we insert on the left. Now compare 4 to 3. Larger than, so we insert on the right. But the right is a leaf, so that is where we put the 4. Here’s the resulting BST:

         8
       /   \
      5     9
    /   \
   3     7
    \
     4

Notice that in this algorithm, new elements are always inserted at the leaves, not in the middle of the tree. This disrupts the tree as little as possible, while still maintaining the BST requirement.

Practice (not for turn in): By hand, follow the above algorithm to create BSTs for each of the following setups on the same numbers:

Exercise (to turn in): Implement add-elt (our description explained the algorithm, which inserts at leaves). Don’t forget to write test cases (and writing them first could really help you here, especially in the leaf case).

fun add-elt(num :: number, to :: BST) -> BST

Also, follow the template code skeleton that we discussed on trees in lecture. It is easy to overthink this problem and get your code in a mess. The code should follow fairly naturally from the examples and template if you use them carefully.

The appeal of BSTs compared to lists is that it seems searching and adding elements should be faster than on lists.

Exercise (to turn in):

This is a topic we will return to in CS18. So if you feel like we may have left some threads dangling here, we did. We’re just catching you up to CS17 for now.