For this assignment, you will first figure out your answers in the Python shell. When you feel like you've arrived at the right answers, type them up in a single Python file and share it with the handin email address. For some guidance on how to do this, see the Python homework instructions.
#
.
In this homework, you will finish the program from class and finally find out the vocabulary size of Moby Dick! Some words of wisdom: your program will most likely not work on the first try. Advice to avoid sitting there clueless about what went wrong:
The first task makes sure that the starter code works for you and provides you examples of iterating through lists in a different way.
HW2-5.py
, which is a slightly modified version of what we wrote together in class. Also download MobyDick.txt
and save it to your desktop.readFile()
function, there is a line that has a hard-coded path for the desktop. Make sure to correct this to the proper path for your desktop. Now look around at the rest of the code.'facebook comes after twitter'
using the second technique, but you cannot do that with the first.vocab = vocabulary('MobyDick.txt')
If you run into trouble on the last step, make sure that MobyDick.txt
is saved to the Desktop. Run the program by hitting F5. Then, inspect the value of the variable vocab
by typing vocab in the interactive interpreter. You should see a list of strings (words), and if you then inspect the length of vocab
(using the built-in function len(vocab)
), it should say 853 . We are only computing the vocab list for the first 10,000 characters in Moby Dick If the program does not work as expected, please email cs0931tas@lists.cs.brown.edu
with what you did and errors you get if any at all. Remember to give an honest effort to solve your problem(s) before contacting the TAs!
As we discussed in class, the way we get rid of duplicates in a list is slow. We conceived a faster way to do it, assuming we can sort a list fast enough. Let's write a function called uniqueWordsFast
that takes a list of words and returns the unique word list in the new way. The algorithm is described below, and the first lines of code are provided for you. They sort the word list and intialize our result vocab list with the first word in the sorted list (unless the input list is empty, in which case you return an empty list because there are no unique words).
1
instead of 0
as the first argument to range()
.index
. (The “looping variable” is the variable right after for
.) Then, upon each iteration of the loop, index
takes a different value (1
, 2
, 3
, ...). What is the expression that evaluates to the element of the list at position index
? (Hint: Remember how we use square brackets with lists.) Put it in a variable called current
. What is the expression that evaluates to the element of the list at the previous position? Put it in a variable called previous
.current
is different from previous
, we want to append current
to our result (it's the first time we see it). If they're the same, we want to ignore current
, because we've seen it before. You can use the appendToList
function we have provided in the code, but make sure you understand how it works.return
it.testUniqueWordsFast()
. Provide at least three test cases. The point of a test function is to provide tricky test cases that might fool the function you're testing, so come up with interesting cases. What should the output be if all the input words are the same? What if they're all different? Remember, uniqueWordsFast()
sorts the words you give it before filtering them. Verify using the test function that uniqueWordsFast()
works properly.Now, let's deploy our new method of removing duplicates.
vocabulary
function so that it uses uniqueWordsFast()
instead of uniqueWordsSlow()
.uniqueWordsFast()
.readFile()
, instead of returning fileText[:1000]
, return fileText
. We are no longer afraid of processing large lists!vocab[1000:2000]
.Congratulations! You have just completed a software upgrade.
Rename your program FirstLast_HW2-5.py
and share it with cs0931handinfall2015@gmail.com
.
Note: Before you turn in your Python files, make sure they run without any errors (save your Python file, then select Run > Run Module
or hit F5
on your keyboard)! If nothing appears in the Shell, don't worry, as long as no red error messages appear. If they don't run, i.e. if red stuff starts appearing in the shell, points will be taken off!