Final Project

Pair Project

For this project you may choose to partner with another student in the class, but it is not required. Partnering has many benefits: you can do more, you can learn more, and when you're stuck you can ask someone familiar with your project code. We highly encourage you to find a partner to work with. Feel free to utilize the Piazza "Search for Teammates" feature.

When working on this project with a partner, you should consider pair programming. Pair programming is the task of programming with two people working on a single computer. This is an exceedingly helpful form of collaboration on a pair project. Please read this extremely useful guide for information on how to pair program successfully. Importantly, no one person should be typing at the computer the majority of the time, and the person not typing (the navigator) needs to follow along and help feed ideas to the person typing (the driver).

Sharing code with a partner is a tricky task because passing the file back and forth through email is cumbersome. Instead, consider creating a shared Dropbox folder for your files. With Dropbox though, overwriting your partner's saved progress is a serious concern. Whenever you edit a file you should clear with your partner that they currently aren't working on it.

This is an independent project. You may discuss with your fellow students ideas related to your question and dataset, but discussions about how to solve your problem can only be discussed with the course staff or your partner.

Project Description

(Independent) Pose a computational question based on numerical data, textual data, or a combination of both. Refer to Projects 1 and 2 for examples of data sources. For your project you must present a testable hypothesis, carry out the required analyses, report your findings in a clear and understandable way, and discuss your results. To visualize your results, you may use any format(s) you want (tables, graphs, maps, etc). You must make relevant files available on the website unless you are not allowed to make your data public. You are responsible for making sure you are using and uploading data properly, and respecting any copyright or license terms.

Project Proposal (Due Friday, April 21 at 11:59pm)

Project Description

Write a concise (one to two pages) description of the project you would like to execute. You will be graded based on the project rubric, so double-check that before handing in your proposal. In general, your proposal should include the following parts:

  1. Claim: the specific hypothesis you plan to test (which is a statement, not a question) and some background context.
  2. Data: a short description of your data source.
  3. Programming Elements: number the steps of your analysis and list what programs and tools you will use for each (e.g. Spreadsheets/Excel or Python). You should also describe your expected milestones. You have about three weeks to complete this project. By the first week, you will have this proposal written and some start on the programming/data formatting. What parts do you expect to complete in week 2? What parts do you expect to complete in week 3?
  4. Potential Roadblocks: a list of potential obstacles.
  5. Interactivity: a description of how you will make your project interactive.
  6. Visual Presentation: a description of how you will visually present your results for your website (e.g., table, chart, screenshot, etc.).

Skeleton Code

It is a good idea for you to include skeleton code in your proposal, but it is not required. Use what you've learned from the past two projects to organize your code early on. This will set you up for success as you finish your code and analysis.

  • Skeleton code for Google Spreadsheets/Excel is a loosely-sketched-out workbook with descriptive spreadsheet names, column headings, and one or two text boxes on each sheet concisely describing the data that will appear in that sheet and how (in the abstract) it will be computed from data in other sheets. The leftmost sheet should contain a sketch of the public "take-homes" of your analysis or interactive elements, as well as a description of the other sheets. You should order the sheets so that the data in a given sheet will generally be computed only from sheets to its right. The rightmost sheets will contain raw data (though they don't have to be filled in in your skeleton version).
  • Skeleton code for Python should follow the style guidelines described for Project 2, both in the original handout and in the feedback email that I sent. This means that every function should contain a documentation string describing its behavior, inputs, and outputs. Functions should be ordered in the file from highest-level at the top to lowest-level at the bottom — you should order them so that each function will generally only call functions that are positioned below it. The bodies of most functions in your skeleton code, especially the higher-level ones, should include calls to lower-level functions to demonstrate how the different functions use each other, as well as explanatory comments to describe things to be implemented or steps you will need to figure out.


Hand in the proposal (named YourName_FinalProject_Proposal.txt or LastName_Lastname_Project_Proposal.txt for pairs) and optional skeleton code similarly titled to with the subject "Final Project Proposal".

Project (Due Tuesday, May 9th at 11:59pm)

Carry out the project you proposed. It's OK if the project changes — that's why it was a proposal.


Refer to previous project descriptions and the project rubric for details about grading. Keep in mind that:

  1. You must have a small test dataset or test spreadsheet showing that your calculations are correct.
  2. You must comment all code (both in Spreadsheets/Excel and Python).
  3. For spreadsheets, the first sheet must describe all other sheets. Input cells should be highlighted and instructions for use should be clear.
  4. For Python files, comments at the top of the program should explain what the program does. Comments throughout the code should make your program easy to follow.


You will also create a website that presents your analysis and results. The site should be made publicly available as we'll compose a page of final projects. You may optionally restrict the access to just people with a Brown email address. The site should contain the following things:

  1. Project description and hypothesis.
  2. Concise explanation of your methods.
  3. Your results, presented in a clear and informative manner.
  4. Discussion of the trends you see in your analysis. You should point out expected and unexpected results.
  5. Reflection of the project. What went well? What didn't?
  6. Python/Spreadsheet/Data files available for download.

Refer to the Final Project Rubric for more details on the code and website requirements.


Create a zip file named or for a pair project. It should contain a folder that includes the following:

  1. All files you used in your project, including Python files, Spreadsheet/Excel files, data files, and test files.
  2. (Optional) A text document named supplement.txt with additional information (tests, analysis, etc) that did not make it on the website.
  3. A text file named README that contains (1) the URL of your web page and (2) a list of all files contained in the zip folder with a short description of what they are.

Share the with

Presentation in class May 2nd and 4th

You will present a maximum 6 minute discussion of your project. For pair projects, you will have a maximum of 10 minutes to present your results. Your presentation should include your claim, your methods, your data, your results, and the limitations of your analysis. Your presentation should also include visuals to help engage the audience, especially to present the results of your findings. We will also include a few minutes for questions.

The results you present do not need to be your final, rigorously tested results, but they should provide a solid case to make a conclusion about your hypothesis.

Practicing your presentation will be essential to ensuring it is smoothly delivered with minimal issues. The time limit will be strictly enforced so we can get to everyone's presentations in time. For pair projects, each partner must present for equal amounts of time. You can choose to go back and forth, or switch speakers half-way through.

Following your presentation, the course staff will additionally provide suggestions and recommendations for improvement, which need to be carefully considered when you hand in your final project. That is, if for some reason you don't incorporate them, you should give reasoning as to why it didn't make sense to do so in your supplemental text.

Presentation order will be chosen randomly at the start of class 10:30am May 2nd. Any individual or team not prepared for the presentation on May 2nd will lose 20% of their presentation score

Extra Credit

For projects that go above and beyond the requirements of this project there will be up to 10% extra credit. This will be given at the discretion of the course staff, but the following list provide a few examples of some extra credit opportunities. Each item listed here would be suitable for 5% extra credit:

  • Analysis of a sizeable dataset (>10,000,000 words/data points)
  • Use of a library or methods beyond those discussed in class or provided by a TA/instructor
  • Well-designed and impactful visualizations using elements or graphical types beyond those demonstrated in class
  • Vigorously tested program with more than 10 test cases, which carefully consider the different ways in which the program might fail and vary in complexity