Trees: what are they good for?
HW 3
We discussed Homework 3 a bit. See the lecture capture for details.
Who cares about binary search trees?
We introduced binary search trees via the find
operation, which determines
whether and where a particular value is in the binary tree. We also defined
add
and remove
operations. So: we can add and remove values from a tree data
structure, and can search for values in that data structure. Who cares?
Sets
Imagine I’m tracking all of the books I’ve read. I sometimes re-read books, but I don’t care about tracking that–I just want to know, for a given book, if I’ve read it or not. How could we implement something like this?
Rather than imagining a particular data structure (e.g., a list of books), let’s raise our level of abstraction slightly. I don’t particularly care how the books I’ve read are stored in the computer’s memory: I just need to be able to:
- Record that I’ve read a book
- Check to see if I’ve read a given book
- Remove a book from my list (if I get a book’s title wrong, maybe)
- Count how many books I’ve read
So it seems like we need some data type that has these operations:
add(item)
contains(item)
remove(item)
count()
As it turns out, computer scientists have a name for this data type: it’s called
a Set
(and based on the mathematical concept of a set). A Set
is an example
of an abstract data type. Notice that we’ve only specified operations on
sets: we haven’t said anything about how sets are actually stored in memory.
How could we implement such a data type?
class TreeSet: def __init__(self): self.tree = BST() def add(self, value): self.tree.insert(value) def contains(self, value): if self.tree.find(value): return True else: return False def delete(self, value): node = self.tree.find(value) if node: self.tree.remove(node) def count(self): return self.tree.number_of_nodes() class HashSet: def __init__(self): self.data = {} def add(self, value): self.data[value] = True def contains(self, value): return value in self.data def delete(self, value): if value in self.data: del self.data[value] def count(self): return len(self.data) class ListSet: def __init__(self): self.data = [] def add(self, value): if value not in self.data: self.data.append(value) def contains(self, value): return value in self.data def delete(self, value): self.data.remove(value) def count(self): return len(self.data)
All of these are implementations of the same abstract data type.
def use_set(s): s.add(4) s.add(2) s.add(9) s.remove(2) print(s.contains(4)) print(s.contains(2)) print(s.count())
This function’s behavior will be the same, regardless of which implementation of
Set
we pass in. We’re taking advantage of encapsulation–we don’t need to
know how our chosen set implementation stores data–and polymorphism–we can
pass any object implementing the Set
methods to use_set
and it will work!
Depending on our use case, we might choose any of these implementations based on running time, memory usage, code simplicity, etc.
Python’s built-in sets
Python includes hash-based sets as a built-in datatype. They look like this:
> s = {1, 2, 3} > s.add(4) > 4 in s True > s.remove(2) > 2 in s False > s {1, 3, 4} > s.add(3) > s {1, 3, 4}