Homework 3
Homework 3: Tree Hugger
The goals of this assignment are:
- To get practice implementing methods on classes
- To get practice designing and developing programs on trees
- To consider some of the ethical implications of autocorrect
Due date
- HW3 is due on Friday, February 28 at 9pm.
Setup and Handin
Setup
In order to setup your hw3 directory on your department machine, open
a terminal and run the command cs0112_install hw3
. This will create
a directory in your ~/course/cs0112/
directory for this assignment.
Handin
You may submit as many times as you want. Only your latest submission will be graded. This means that if you submit after the deadline, you will be using a late day – so do NOT submit after the deadline unless you plan on using late days.
The README template can be found here. Please follow the design and clarity guide–part of your grade will be for code style and clarity. After completing the homework, you will submit:
README.txt
bst.py
test_bst.py
trie.py
test_trie.py
Because all we need are the files, you do not need to submit the whole project folder. As long as the file you create has the same name as what needs to be submitted, you’re good to go!
If you are using late days, make sure to make a note of that in your README. Remember, you may only use a maximum of 3 late days per assignment. If the assignment is late (and you do NOT have anymore late days) no credit will be given.
Please don’t put your name anywhere in any of the handin files–we grade assigments anonymously!
You can follow this step-by-step guide for submitting assignments through gradescope here.
Helpful Things
Documentation
Staff Assistance
Your friendly TAs and Professor are here to help with this assignment! You can find our schedule for office hours here.
The Assignment
Part 1: Searching for answers
In class, we implemented some methods on a binary search tree. For this part of the assignment, you’ll write a few additional methods that a binary search tree implementation might provide.
The bst.py
file contains our binary search tree implementation from class,
along with a few unimplemented methods:
- The
height
method should return the height of the binary tree: that is, the maximum distance between the root node and a leaf node. The height of an empty binary tree is 0; the height of a tree with just a root node is 1. Hint: Python’smax
function returns the maximum of two numbers. - The
number_of_nodes
method should return the total number of nodes in the tree. - The
between
function should return a sorted list of the elements in the tree whose values are between its two arguments. The list shouldn’t include elements which are equal to either argument (i.e., the range should be EXCLUSIVE on both sides). Don’t build this list by getting a list of all of the elements in the tree and then filtering the list–there’s a more efficient way to find elements in a particular range! Hint: think about what you should do if the root node’s value is 8 and you’re looking for elements between 9 and 20!
You will probably to want to use helper methods for all three of these. Please do so! Your helper methods should have docstrings and type annotations, but do not need separate tests.
You should include tests for all of your new BST methods in test_bst.py
. You
don’t need to test existing methods (although your tests will probably want to
call the add
method).
Part 2: You complete me
In this assignment you’re going to implement an autosuggestion system: given some text a user has typed so far (for instance, “ha”) you’ll provide the possible words they might be trying to type (for instance, “hat”, “hand”, etc.) You’re not trying to do any spelling correction–you’re just predicting the word based on prefixes.
In order to do this, you’re going to use a special kind of tree called a trie.
Tries
A trie (usually pronounced “try”) is a tree where the nodes of the tree represent prefixes of words. Each node has a dictionary from characters to its children; each child is a longer prefix of the same word, formed by adding a character.
Let’s take a look at an example:
This trie includes the words: “hat”, “hand”, “a”, and “at”. To create a trie like this, we’d start with a root node that has no children.
When adding the word “hat”, we would check if the root node’s dictionary of
children contains an entry for h. Since it doesn’t, we create a node for h
and add it to the root node’s dictionary (under the key "h"
. Then, we create a
node for ha and add it to the h node’s dictionary (under the key
"a"
). Finally, we create a child for hat and add it to the ha node’s
dictionary (under the key "t"
). We also need to mark that, unlike h and
ha, hat represents a complete word. Words are marked with red circles in the diagram above.
When we add “hand” after adding “hat”, we once again check if h is as a
child of the root node (i.e., if it has an entry in its dictionary of children
for "h"
). It does, so we don’t need to add a new node. Now we look at the h
node. Does h have an ha child node (i.e., does it have an entry in its child
dictionary for "a"
? It does! Does ha have an han child node? Nope, not
yet. Therefore, we add han as a child of ha, and then hand as a child of
han. We mark the hand node as a word.
Once we add “a” and “hat”, the trie looks like the picture above. Notice that prefixes of words can themselves be words–for instance, “a” is both a word and a prefix of “at”.
Implementing autocomplete
We’ve provided a partial implementation of a trie in trie.py
, including
methods to insert a word and to determine if a word is present. You should read
through and understand the provided code, because you’ll be writing similar code
for the assignment!
A TrieNode
is a node within the Trie
. TrieNodes
have a prefix (for
instance, ""
or "h"
or "ha"
or "hat"
) and a dictionary of children. They
also have a word
field, which will be set to True
if the prefix represents a
word in the trie.
The Trie
class implements the actual trie data structure. Tries are
initialized with a single root node containing the empty string as the prefix
(the empty string is a prefix of every word). The insert
method adds a word to
the trie by adding nodes if necessary and then marking the word’s final node as
a word. The contains
method determines whether a word is present in the trie
by walking down a path in the trie.
You’re going to fill in two methods to the trie in order to implement autocomplete:
words_below
should return a list of all of the words in the trie below or at the provided node. For instance, if it’s called on the root node, it should return all of the words in the trie. If it’s called on the a node in the trie above, it should return"a"
and"at"
.words_below
returns a list of strings.completions
returns all of the words in the trie that start with the given prefix. This method should callwords_below
after walking down a path in the trie corresponding to the prefix.completions
should return the empty list if there are no words in the trie that start with the prefix.
You should test your new trie methods in test_trie.py
. Make sure to test all
of the interesting cases!
We’ve included a program, autosuggest.py
, that presents an interface to your
autocompletion system. Play around with it and see how autocompletion systems
work in practice!
Part 3: Autocomplete Ethics
Consider how search tree implementations can have ethical implications. Please respond with 2-3 thoughtful paragraphs for each of the following questions:
-
Think about another instance machine learning-informed autocomplete is used — in communication platforms, web browsers, text processors, etc.
For example, Google has undergone criticism and legal suits because of its Autocomplete feature — the company was convicted of defaming a French man by linking his name with sexual offenses, revealed the names of victims granted anonymity courts, spread unauthenticated rumors about the wife of a former German presidentss private life, and provided hateful autocomplete options for phrases relating to certain ethnic/religious/gender groups. Google Autocomplete was also originally called “Google Suggest.” Google renamed this technology in 2010, and now emphasizes that Autocomplete provides “predictions,” not “suggestions.”
Discuss an ethics-related issue that may arise from an autocomplete feature. Some questions to think about: the parties/demographics/industries affected, the severity of this effect, who (if anyone) should be held accountable, who (if anyone) should be responsible for remediating the problem, ways to address the issue, etc. You can analyze Google Autocomplete or a different instance of autocomplete.
-
You’re designing a product that has an autocomplete feature — how would your implementation of this feature mitigate the ethics issue you detailed in the previous question? Think about the implementation of the underlying algorithm, the data sets used to inform it, the visual components/design, manual measures to moderate the feature, etc.