Homework 2-7 : CSCI 0931

For this assignment, you will first figure out your answers in the Python shell. When you feel like you've arrived at the right answers, type them up in a single Python file and share it with the handin email address. For some guidance on how to do this, see the Python homework instructions.

Continuity Notice

Homework 2-8 builds off of your work in 2-7, so make sure you do not delete your work as soon as you are done!

Reminders

If a problem is marked as “(Independent)”, you may only discuss the problem with course staff. Otherwise, you are free to discuss the concepts that will help you solve the problems with classmates as well as course staff. However, you are never allowed to simply copy answers.

Advice

In this homework, you will write a program that produces a concordance for any given texts in one second. You have learned all the necessary pieces and all you have to do is to put them together. This does not mean that this homework is easy; on the contrary, it may be the hardest one you have encountered so far. So before you start, bear the following advice in mind:

When your program is wrong and does not work, you will almost certainly make it more wrong by trying to change random parts of it, hoping to stumble on the correct answer (putting brackets around stuff, changing the indentation, etc). Only make changes when you have good reasons. For example:
- “Oh, I see what I did wrong here: I tried to add a number to a list. I should put brackets around that number so that I append the single-number list onto the end of the bigger list.”
- Or “This statement should be executed every time the iteration runs. Let me indent it so it's inside the loop.”
Try to read the error message. The error message usually comes with a line number to indicate where the error occurs, together with what error it is. For example:
NameError: name 'inventory' is not defined
means that you've used a variable that has not been assigned a value.
KeyError: 'Alabama'
means that you are trying to look up 'Alabama' in a dictionary that does not contain it as a key. Sometimes it is obvious how to fix errors if you understand the error message. If it is not clear, try to run the program on paper (using the model we gave you in class) and you may find you've called an object-specific function on the wrong type of object or asked for the value of a key which is not in the dictionary.

Task 1

Even though Python was meant to be a light-weight language, it has grown significantly over the years. It is practically impossible to learn every bit of it in lectures. On the other hand, Python developers have made comprehensive documentations of the language and put them online. It's time for us to learn to use this resource. The more you program, the more you will need it.

Go to the Python documentation page for dictionaries. It describes the usage of dictionaries, as well as of operators and functions that are defined on them. Skim the documentation and see if you can relate it to what we've learned in class (it's okay if you do not understand most of it). Then, find the descriptions of these three functions:

Fill in the missing code for the function printMovieRevenues(). Refer to the provided function printMovieNames(), which prints the keys of the dictionary but not the values. Think about how you could modify this function to print only the values.
Fill in the missing code for the function hugeMovies(). Refer to the provided function startsWithTHE(), which looks through the keys in the dictionary and adds the key-value pair to a new dictionary if it passes a test. Think about what test you would use to find a high-revenue movie.
Fill in the missing code for the function biggestMovie(). Like the previous function, you probably want to iterate over all keys in the dictionary and check their paired values. Hint: Can you store the key of the biggest movie so far in a variable as you iterate through the dictionary? What test will you need to do in order to update that variable?

Task 2

Consider the task of organizing a phonebook. My phonebook is in the following format:

That is, a list of lists. Each enclosed list contains two elements, the first being a name and the second a telephone number. Note that I may have several phone numbers for the same name. I want to organize my phonebook so that it is a dictionary instead — one that maps names to lists of phone numbers. For this example, the organized phonebook should look like this:

Fill in the body for the function organizeLists(). It takes a list of this format and returns a dictionary. Although you aren't required to use this function in the rest of this assignment, you will do some very similar tasks that will borrow concepts from this function. Make sure your function works on at least the example we've provided above.

Hint: You may need to check if a dictionary contains a certain key. We talked about doing this using the expression key in the_dict, where key and the_dict are variables representing some object that might be a key in the dictionary and the dictionary itself, respectively. Evaluating this expression will give True if key is a key in dictionary the_dict, or False otherwise.

Interlude: Building a Concordance

Before going on to the next task, let's talk about the functions we might want for creating a concordance. Note how we divide up a big task into small ones and solve them separately. You will have to do this for your project!

First, we define the goal. A concordance is a listing of each unique word in a book, followed by a short snippet of text surrounding each occurance of this word in the book. Associating pairs of things (a word and its context) is what a dictionary does, so we should aim to construct a dictionary. The keys of such a dictionary would be strings (words), and the values would be list of strings (one string for the text surrounding each occurance of the word).

To save space and to make it simpler, each value will instead be a list of integers, representing the positions of the word in the overall text. Our goal is to construct this dictionary, and then to interpret it to produce a traditional concordance.

Now we can think about how to do each of these steps. Each step should have a function that performs it: one to build the concordance dictionary, and another to print the formatted concordance. Let's think about the input and output for each of these functions.

buildConcordance() is the function that builds the concordance.
- Its input is a single argument: a text (represented as a long string).
- Its output is a dictionary that maps words (strings) to a list of positions in the text (integers).
printConcordance() is the function that queries the concordance dictionary to display a traditional text concordance.
- Its input is three arguments: a word (string), a concordance (dictionary), and a text (another string).
- It has no output, but instead prints all occurrences of the word in the text, with surrounding context, using the dictionary.

Take a moment to reflect on these descriptions and think about what the header comment for each function might look like, including the input/output description using the int * string -> int kind of notation we saw in class.

Task 3

Let's tackle the function buildConcordance() first. In the starter code for this homework, the function buildConcordance() just iterates through some text, and prints out matches with positions. It uses a special set of tools called "regular expressions" to simplify the task of cleaning up a string, breaking it into words, and keeping track of the positions of those words, but in essence it's doing the same thing as your final version of the vocabulary() function from HW2-6. You will learn about this alternative, more efficient/convenient way to parse text later in the course, but for now, test it and see what it is doing. Look over the code to get an idea of what it does, and run it.

In each iteration, you get a word and its postion. You want to organize them into a dictionary that maps a word to a list of positions. Hmmm... does that remind you of the warmup in Task 2?

Modify the header comment to include the input/output description you thought of earlier. (Every function you write from now on should have one of these!)
Modify the function buildConcordance() so that it returns a dictionary that maps each word to a list of positions. The starter code has a variable test_text defined that contains some of Moby Dick. You can use this to test your function.
Fill in testBuildConcordance() with at least three test cases to make sure your function works properly, even on tricky cases.

Handin

Rename your program FirstLast_HW2-7.py and share it with cs0931handinfall2015@gmail.com .

Note: Before you turn in your Python files, make sure they run properly (save your Python file, then select Run > Run Module or hit F5 on your keyboard)! If nothing appears in the Shell, don't worry, as long as no red error messages appear. If they don't run, i.e. if red stuff starts appearing in the shell, points will be taken off!