Project 2

Various Deadlines


This is an independent assignment. You may discuss this assignment with course staff only.

Project Description

(Independent) Pose a computational question based on textual data. You may use your own data source, or choose from the data sources we discussed in class:

  1. Project Gutenberg:
  2. Dictionary:
  3. American Presidency Project Debates:

For your project you must present a testable hypothesis, carry out the required analyses, report your findings in a clear and understandable way, and discuss your results.

To present your results, you may report descriptive summary statistics (such as count, mean, median, standard deviation, etc.). You will also be able to import your results into Excel to analyze basic trends (via color formatting and plotting). However, you will not be graded on any Excel work beyond presenting your results in a clear manner.

Project Proposal (Due Thursday, April 4 at 2:25pm)

Project Description (YourName_Proj2_Proposal.txt)

Write a concise (one-page) description of the project you would like to execute. This description should include the following parts:

  1. Background: put your project idea in context.
  2. Claim: the specific hypothesis you plan to test (which is a statement, not a question).
  3. Data: a one-sentence description of your data source.
  4. Programming Elements: a few sentences describing the problems you will need to write Python functions for.
  5. Potential Roadblocks: a list of potential obstacles.
  6. Project Modifications: address the following two scenarios:
    • Backup Plan: suppose your project is much harder than you anticipate. What parts of the project would you change to still get somewhat interesting results?
    • Increasing Degree of Difficulty: suppose your project is much easier than you anticipate. What ways would you extend the project?

Skeleton Code (

Write a Python file that contains an outline of the code you anticipate writing. This file should compile! It should include the following:

  1. Comments at the top of the file describing what the program does.
  2. Functions that you will write (of course, you might change this later).
  3. Function descriptions (in triple quotes) that describe (1) what the function does, (2) what the inputs are, and (3) what the outputs are.
  4. Some lines of code and comments that help describe what the functions will contain.

Don't get too wrapped up in the details here — the goal of the skeleton code is to provide you with an outline of what you have to program.


Hand in both files to

Project (Due Thursday, April 11 at 2:25pm)

Carry out the project you proposed. It's OK if the project changes — that's why it was a proposal.

Python Program

After filling in your skeleton code, you are almost done. However, to make this code usable for others, you must do a few more things.

  1. Provide instructions on how to run your program (in comments).
  2. Provide at least one test function and/or test file that verifies that your code does what it should do. Include instructions in the comments.
    • If you have a tricky function that has a regular expression, write a test file and show that the function returns the proper result.
    • If you have a function that counts occurrences of words, write a test file and show that the function returns the proper result.
  3. Handle data and input errors and notify the user.
    • One way to notify the user is to print a string (such as "Error! Input should be an integer, not a float" and return nothing.
    • Remember the type() function. The following expressions all evaluate to True:
      type('a') is str
      type([1,2,3]) is list
      type(2.5) is float
    • Suppose you are using Twitter data, and you know that each line should be split into 13 elements. To skip any lines that do not have 13 elements, you could write:
      if len(myList) != 13:
        print "Skipping line with != 13 elements", myList
        #continue with code...


You will create a website that presents your analysis and results. The site may be public, or you may restrict it to only people with a Brown email address if you like. The site should contain the following things:

  1. Project description and hypothesis.
  2. Concise explanation of your methods.
  3. Your results, presented in a clear and informative manner.
  4. Discussion of the trends you see in your analysis. You should point out expected and unexpected results.
  5. Reflection of the project. What went well? What didn't?
  6. Python and data/test files available for download.

Refer to the Project 2 Rubric for more details on the code and website requirements.


Create a zip file named It should contain a folder that includes the following:

  1. All files you used in your project, including Python files, Excel files, data files, and test files.
  2. A text file named 'README' that contains (1) the url of your web page and (2) a list of all files contained in the zip folder with a short description of what they are.

Hand in the zip file to