Homework 2-6

Due October 27, 2015, 9:00 am

For this assignment, you will first figure out your answers in the Python shell. When you feel like you've arrived at the right answers, type them up in a single Python file and share it with the handin email address. For some guidance on how to do this, see the Python homework instructions.

Continuity Notice

For this homework, use the file you turned in for HW2-5. At the end of the homework, rename the file/save another copy and turn it in. You will not be graded on the work in HW2-5 twice.

Reminders

If a problem is marked as “(Independent)”, you may only discuss the problem with course staff. Otherwise, you are free to discuss the concepts that will help you solve the problems with classmates as well as course staff. However, you are never allowed to simply copy answers.
When you write your functions, remember to write down what your functions does and what the arguments mean, by commenting your code. Remember that comments start with a #.

Task 1

As you inspect different portions of the vocabulary list, you may see that many improvements can still be made! There are numbers, punctuation, and mixed cases (whale and Whale should really be the same word). Now, let's fix that.

Essentially, we want to clean up the big string we get out of the file before we split it into words. Two possible things to do are changing all letters to lowercase, and replacing all numbers and punctuation with whitespace (so that eat,pray,love can be split as if it were eat pray love).

  1. Now write the function removePunctuation() that takes a string and returns another string, replacing punctuation with whitespaces. Also keep in mind that you might need to use escape characters for certain characters.
  2. Write the testRemovePunctuation() to make sure it works correctly. Write at least three test cases, and remember that the idea is to come up with tricky cases that your function might get fooled by. Make sure that you put punctuation in tricky places: the beginning, middle, or end. Make sure you test cases where there are multiple punctuation marks in a row, or where there are no punctuation marks at all.

Task 2

  1. Now, we need to insert this cleanup step into our assembly line. Our assembly line is in the function vocabulary() (read → split → remove duplicates). Use the function cleanUp() to clean up your big string before you split it.
  2. Run your program and inspect your vocab list (partially). You may discover that you may have missed a lot of possible punctuation. Go back and improve your function as needed.
  3. Hooray! Now the vocab list looks pretty good.

Task 3 (Extra Credit)

Now, you will create a function that computes the average length of a word in a given string that is greater than a given length (for simplicity's sake, return the average as the round() of the actual average, which will most likely be a decimal). For example, this function should be able to find the average length of a word in Moby Dick among the words that are at least n characters long. There is some stencil code already written for you in a function called averageWordLength(), which calls the function that does the real heavy lifting, averageWordLengthInList(). You need to fill in this latter function and test it. Some (hopefully) helpful guidelines before you begin:

  1. You will need at least four variables:
    • wordList holds a list of words from your text file, and is provided as an argument. Note that in averageWordLength(), we've computed this list by reading a file, cleaning up the text, and then finding the unique words. However, the functionality of averageWordLengthInList() should not depend on the list being cleaned up this way; it should work on any list of strings.
    • minLength is just an integer, and it is also provided as an argument. Every word whose length you include in your average should have a length greater than or equal to minLength.
    • sumOfLengths should start out as zero and will hold the current sum of the word lengths as you consider all the words in your list. Of course, when considering a single word out of the list, you should only increase sumOfLengths if the word is at least minLength characters long.
    • numWords should be equal, once you've looked at all the words in the list, to the number of words that were at least as long as minLength. I can think of two ways to do this: either by starting at zero and counting up, or by starting at len(wordList) and counting down. Either is fine, or you can come up with your own way to keep track if you'd like.
  2. After the variables sumOfLengths and numWords are created, you will need to iterate through wordList. Which method of iterating should you use of the two methods described earlier? Do you need to handle specific indices differently, or is each word in wordList handled the same way?
  3. Be sure that your function is returning an integer, which may require rounding the actual decimal average length.
  4. Make sure to fill in testAverageWordLengthInList() with at least three test cases. Use different minimum lengths. What would be an unusual minimum to test? What would be an unusual list to test? Don't worry about testing lists that contain things other than strings, or minimum lengths that are anything other than integers.
  5. Tips for testing. Remember that the average word length returned by the function should always be greater than or equal to the minimum length required unless all of the words in the list are less than the minimum required length. In this case, return an average length of 0.

Handin

Rename your program FirstLast_HW2-6.py and share it with cs0931handinfall2015@gmail.com .

Note: Before you turn in your Python files, make sure they run properly (save your Python file, then select Run > Run Module or hit F5 on your keyboard)! If nothing appears in the Shell, don't worry, as long as no red error messages appear. If they don't run, i.e. if red stuff starts appearing in the shell, points will be taken off!