For this assignment, you will first figure out your answers in the Python shell. When you feel like you've arrived at the right answers, type them up in a single Python file and share it with the handin email address. For some guidance on how to do this, see the Python homework instructions.
#
.
As you inspect different portions of the vocabulary list, you may see that many improvements can still be made! There are numbers, punctuation, and mixed cases (whale
and Whale
should really be the same word). Now, let's fix that.
Essentially, we want to clean up the big string we get out of the file before we split it into words. Two possible things to do are changing all letters to lowercase, and replacing all numbers and punctuation with whitespace (so that eat,pray,love
can be split as if it were eat pray love
).
cleanUp()
that takes a string as an argument and returns a cleaned-up string. This function breaks the problem of cleaning up the string into a few smaller problems, and we'll need to define more functions for this to work! First, this function uses a built-in function called lower()
to turn all letters in the string to lowercase (you can convince yourself it works by running a simple example in the interactive environment). The functions removeNumbers()
and removePunctuation()
should do what their names suggest. Ask yourself: what type of values do they take as arguments? What type of values do they return?removeNumbers()
is already filled in for you as an example. Read it and make sure that you understand why it works. There are a couple of things to notice:
+=
short-hand. Try it out in the interactive environment to see how it works if you're curious.removePunctuation()
that takes a string and returns another string, replacing punctuation with whitespaces. Also keep in mind that you might need to use escape characters for certain characters.testRemovePunctuation()
to make sure it works correctly. Write at least three test cases, and remember that the idea is to come up with tricky cases that your function might get fooled by. Make sure that you put punctuation in tricky places: the beginning, middle, or end. Make sure you test cases where there are multiple punctuation marks in a row, or where there are no punctuation marks at all.vocabulary()
(read → split → remove duplicates). Use the function cleanUp()
to clean up your big string before you split it.Now, you will create a function that computes the average length of a word in a given string that is greater than a given length (for simplicity's sake, return the average as the round()
of the actual average, which will most likely be a decimal). For example, this function should be able to find the average length of a word in Moby Dick among the words that are at least n characters long. There is some stencil code already written for you in a function called averageWordLength()
, which calls the function that does the real heavy lifting, averageWordLengthInList()
. You need to fill in this latter function and test it. Some (hopefully) helpful guidelines before you begin:
wordList
holds a list of words from your text file, and is provided as an argument. Note that in averageWordLength()
, we've computed this list by reading a file, cleaning up the text, and then finding the unique words. However, the functionality of averageWordLengthInList()
should not depend on the list being cleaned up this way; it should work on any list of strings.minLength
is just an integer, and it is also provided as an argument. Every word whose length you include in your average should have a length greater than or equal to minLength
.sumOfLengths
should start out as zero and will hold the current sum of the word lengths as you consider all the words in your list. Of course, when considering a single word out of the list, you should only increase sumOfLengths
if the word is at least minLength
characters long.numWords
should be equal, once you've looked at all the words in the list, to the number of words that were at least as long as minLength
. I can think of two ways to do this: either by starting at zero and counting up, or by starting at len(wordList)
and counting down. Either is fine, or you can come up with your own way to keep track if you'd like.sumOfLengths
and numWords
are created, you will need to iterate through wordList
. Which method of iterating should you use of the two methods described earlier? Do you need to handle specific indices differently, or is each word in wordList
handled the same way?testAverageWordLengthInList()
with at least three test cases. Use different minimum lengths. What would be an unusual minimum to test? What would be an unusual list to test? Don't worry about testing lists that contain things other than strings, or minimum lengths that are anything other than integers.Rename your program FirstLast_HW2-6.py
and share it with cs0931handinspring2016@gmail.com
.
Note: Before you turn in your Python files, make sure they run properly (save your Python file, then select Run > Run Module
or hit F5
on your keyboard)! If nothing appears in the Shell, don't worry, as long as no red error messages appear. If they don't run, i.e. if red stuff starts appearing in the shell, points will be taken off!