Homework 2-4

Due March 21, 2013, 2:25 pm

Reminders

For the following problems you may discuss the concepts that will help solve these problems with classmates and course staff. You may not simply copy down the answers of your classmates as that is a violation of the collaboration policy. The one exception to this rule are those problems marked as “(Independent)”. You may discuss independent problems with course staff only.

Advice

In this homework, you will write a program that produces a concordance for any given texts in one second. You have learned all the necessary pieces and all you have to do is to put them together. This does not mean that this homework is easy; on the contrary, it may be the hardest one you have encountered so far. So before you start, bear the following advice in mind:

Task 1

Even though Python was meant to be a light-weight language, it has grown significantly over the years. It is practically impossible to learn every bit of it in lectures. On the other hand, Python developers have made comprehensive documentations of the language and put them online. It's time for us to learn to use this resource. The more you program, the more you will need it.

Go to the Python documentation page for dictionaries. It describes the usage of dictionaries, as well as of operators and functions that are defined on them. Skim the documentation and see if you can relate it to what we've learned in class (it's ok if you do not understand most of it).

Using the search function of your web browser or just your eyes, find the descriptions of these three functions:

clear()
keys()
pop(key)
Study them and complete HW2-4.py according to the instructions given as comments. You do not necessarily have to use the functions listed above, but they'll probably be helpful.

Task 2

Consider the task of organizing a phonebook. My phonebook is in the following format:

[['Peter', '345-8766'],
 ['Lois', '459-2346'],
 ['Stewie', '345-2354'],
 ['Peter', '854-1198']]

That is, a list of lists. Each enclosed list contains two elements, the first being a name and the second a telephone number. Note that I may have several phone numbers for the same name. I want to organize my phonebook so that it is a dictionary instead — one that maps names to lists of phone numbers. For this example, the organized phonebook should look like this:

{'Peter': ['345-8766', '854-1198'],
 'Stewie': ['345-2354'],
 'Lois': ['459-2346']}

Write a function that does this job (takes a list of this format and returns a dictionary). You do not have to hand it in, but keep it as a reference as you build your concordance. Although you won't use this function in the rest of this assignment, you will do some very similar tasks that will borrow concepts from this function. So it's important to spend the time to get this function correct and to understand how it works. Test it on contrived examples (make up some phonebooks).

Important: Review the first part of the homework or the slides from class for how to get the value associated with a given key, how to add a key and its associated value to a dictionary, and how to change the value associated with a key. You will also need to check if a dictionary contains a certain key by using this expression:

key in the_dict
which evaluates to True if key is a key in dictionary the_dict, or False otherwise. Try it with your own examples first.

Interlude: Building a Concordance

Before going on to the next task, let's talk about the functions we might want for creating a concordance. Note how we divide up a big task into small ones and solve them separately. You will have to do this for your project!

First we define the goal. A concordance is a listing of each unique word in a book, followed by a short snippet of text surrounding each occurance of this word in the book. Associating pairs of things (a word and its context) is what a dictionary does, so we should aim to construct a dictionary. The keys of such a dictionary would be strings (words), and the values would be list of strings (one string for the text surrounding each occurance of the word). To save space and to make it simpler, each value could instead be a list of integers, representing the positions of the word in the overall text. Our goal is to construct this dictionary, and then to interpret it to produce a traditional concordance.

Now we can think about how to do each of these steps. Each step should have a function that performs it: one to build the concordance dictionary, and another to print the formatted concordance. Let's think about the input and output for each of these functions.

  1. buildConcordance() is the function that builds the concordance.
    • Its input is a single argument: a text (represented as a long string).
    • Its output is a dictionary that maps words (strings) to a list of positions in the text (integers).
  2. printConcordance() is the function that queries the concordance dictionary to display a traditional text concordance.
    • Its input is three arguments: a word (string), a concordance (dictionary), and a text (another string).
    • It has no output, but instead prints all occurrences of the word in the text, with surrounding context, using the dictionary.

Task 3

Let's tackle the function buildConcordance() first. In the starter code for this homework, the function buildConcordance() just iterates through some text, and prints out matches with positions. It uses a special set of tools called "regular expressions" to simplify the task of cleaning up a string, breaking it into words, and keeping track of the positions of those words, but in essence it's doing the same thing as your final version of the vocabulary() function from HW2-3. Look over the code to get an idea of what it does, and run it.

In each iteration, you get a word and its postion. You want to organize them into a dictionary that maps a word to a list of positions. Hmmm... does that remind you of the warmup question? Modify the function buildConcordance() so that it returns a dictionary that maps each word to a list of positions.

Fill in testBuildConcordance() to make sure your function works properly, even on tricky cases.

Task 4

printConcordance() is relatively simple. Given a word, a concordance dictionary returned by buildConcordance(), and the corresponding text, you want to print all occurrences of that word in the text with context. Here are the steps:

  1. Query the concordance (dictionary) for all the positions at which the word occurs.
  2. For each position:
    1. Form a string that spans the position. (e.g., if your word starts at position 584, then text[564:604] ought to give you reasonable context surrounding that occurrence of the word. How many characters you want to print before and after the word is your choice.)
    2. Print that string.
  3. Return nothing.

Task 5

Fill in the test function testPrintConcordance(). Provide at least five test cases, and try tricky cases. Since printConcordance() doesn't actually return anything, there's no way to programmatically test whether each call does the right thing or not. Instead, just make each call and, after each one, print what you would expect to see printed. Remember to try tricky cases; that's the point of a test function.

In particular, try out this call in the interactive interpreter:

printConcordance('ishmael', conc, test_text)

Chances are that this function call will do something unexpected. What is going on? In a comment in the test function, explain what went wrong. Then fix the problem that you find by modifying your code in printConcordance(). Hint: you have to do some extra work when calculating the start and end indices of your strings. Make sure that no index is smaller than 0 or larger than the size of your text.

Task 6

Extra Credit: Run your program again, but this time, read the Moby Dick text file and print the concordance of the entire text. In the interactive environment, try to invoke the function printConcordance() with proper arguments. Remember that all the variables you defined in your program outside of any function definitions are still “alive” in the interactive environment that pops up after you run the program.

You may notice that the text printed by your concordance does not look quite good, mainly because of the line breaks. Let's do some cosmetic work. In the last task, you formed some strings and printed them out. Before you print them, replace all the line breaks (\n) with white spaces. You can choose to write a for-loop, or call a function that you yourself defined, or call str.replace().

Task 7

Play with your program! You should by now feel really proud of yourself. This is a pretty complicated program that you've managed to get working, and it's actually kind of useful!

Handin

Rename your file to YourName_HW2-4.py and email it to cs0931handin@gmail.com.