Homework 2-7

Due October 29, 2015, 9:00 am

For this assignment, you will first figure out your answers in the Python shell. When you feel like you've arrived at the right answers, type them up in a single Python file and share it with the handin email address. For some guidance on how to do this, see the Python homework instructions.

Continuity Notice

Homework 2-8 builds off of your work in 2-7, so make sure you do not delete your work as soon as you are done!

Reminders

If a problem is marked as “(Independent)”, you may only discuss the problem with course staff. Otherwise, you are free to discuss the concepts that will help you solve the problems with classmates as well as course staff. However, you are never allowed to simply copy answers.

Advice

In this homework, you will write a program that produces a concordance for any given texts in one second. You have learned all the necessary pieces and all you have to do is to put them together. This does not mean that this homework is easy; on the contrary, it may be the hardest one you have encountered so far. So before you start, bear the following advice in mind:

Task 1

Even though Python was meant to be a light-weight language, it has grown significantly over the years. It is practically impossible to learn every bit of it in lectures. On the other hand, Python developers have made comprehensive documentations of the language and put them online. It's time for us to learn to use this resource. The more you program, the more you will need it.

Go to the Python documentation page for dictionaries. It describes the usage of dictionaries, as well as of operators and functions that are defined on them. Skim the documentation and see if you can relate it to what we've learned in class (it's okay if you do not understand most of it). Then, find the descriptions of these three functions:

clear()
keys()
pop(key)
Study them and complete HW2-7.py according to the instructions given as comments. You do not necessarily have to use the functions listed above, but they'll probably be helpful. If needed, review the first part of the homework or the slides from class for how to get the value associated with a given key, how to add a key and its associated value to a dictionary, and how to change the value associated with a key.

  1. Fill in the missing code for the function printMovieRevenues(). Refer to the provided function printMovieNames(), which prints the keys of the dictionary but not the values. Think about how you could modify this function to print only the values.
  2. Fill in the missing code for the function hugeMovies(). Refer to the provided function startsWithTHE(), which looks through the keys in the dictionary and adds the key-value pair to a new dictionary if it passes a test. Think about what test you would use to find a high-revenue movie.
  3. Fill in the missing code for the function biggestMovie(). Like the previous function, you probably want to iterate over all keys in the dictionary and check their paired values. Hint: Can you store the key of the biggest movie so far in a variable as you iterate through the dictionary? What test will you need to do in order to update that variable?

Task 2

Consider the task of organizing a phonebook. My phonebook is in the following format:

[['Peter', '345-8766'],
 ['Lois', '459-2346'],
 ['Stewie', '345-2354'],
 ['Peter', '854-1198']]

That is, a list of lists. Each enclosed list contains two elements, the first being a name and the second a telephone number. Note that I may have several phone numbers for the same name. I want to organize my phonebook so that it is a dictionary instead — one that maps names to lists of phone numbers. For this example, the organized phonebook should look like this:

{'Peter': ['345-8766', '854-1198'],
 'Stewie': ['345-2354'],
 'Lois': ['459-2346']}

Interlude: Building a Concordance

Before going on to the next task, let's talk about the functions we might want for creating a concordance. Note how we divide up a big task into small ones and solve them separately. You will have to do this for your project!

First, we define the goal. A concordance is a listing of each unique word in a book, followed by a short snippet of text surrounding each occurance of this word in the book. Associating pairs of things (a word and its context) is what a dictionary does, so we should aim to construct a dictionary. The keys of such a dictionary would be strings (words), and the values would be list of strings (one string for the text surrounding each occurance of the word).

To save space and to make it simpler, each value will instead be a list of integers, representing the positions of the word in the overall text. Our goal is to construct this dictionary, and then to interpret it to produce a traditional concordance.

Now we can think about how to do each of these steps. Each step should have a function that performs it: one to build the concordance dictionary, and another to print the formatted concordance. Let's think about the input and output for each of these functions.

  1. buildConcordance() is the function that builds the concordance.
    • Its input is a single argument: a text (represented as a long string).
    • Its output is a dictionary that maps words (strings) to a list of positions in the text (integers).
  2. printConcordance() is the function that queries the concordance dictionary to display a traditional text concordance.
    • Its input is three arguments: a word (string), a concordance (dictionary), and a text (another string).
    • It has no output, but instead prints all occurrences of the word in the text, with surrounding context, using the dictionary.

Take a moment to reflect on these descriptions and think about what the header comment for each function might look like, including the input/output description using the int * string -> int kind of notation we saw in class.

Task 3

Let's tackle the function buildConcordance() first. In the starter code for this homework, the function buildConcordance() just iterates through some text, and prints out matches with positions. It uses a special set of tools called "regular expressions" to simplify the task of cleaning up a string, breaking it into words, and keeping track of the positions of those words, but in essence it's doing the same thing as your final version of the vocabulary() function from HW2-6. You will learn about this alternative, more efficient/convenient way to parse text later in the course, but for now, test it and see what it is doing. Look over the code to get an idea of what it does, and run it.

In each iteration, you get a word and its postion. You want to organize them into a dictionary that maps a word to a list of positions. Hmmm... does that remind you of the warmup in Task 2?

  1. Modify the header comment to include the input/output description you thought of earlier. (Every function you write from now on should have one of these!)
  2. Modify the function buildConcordance() so that it returns a dictionary that maps each word to a list of positions. The starter code has a variable test_text defined that contains some of Moby Dick. You can use this to test your function.
  3. Fill in testBuildConcordance() with at least three test cases to make sure your function works properly, even on tricky cases.

Handin

Rename your program FirstLast_HW2-7.py and share it with cs0931handinfall2015@gmail.com .

Note: Before you turn in your Python files, make sure they run properly (save your Python file, then select Run > Run Module or hit F5 on your keyboard)! If nothing appears in the Shell, don't worry, as long as no red error messages appear. If they don't run, i.e. if red stuff starts appearing in the shell, points will be taken off!