Final project guide

CS112 culminates in a final project. This project is to be completed over the last weeks of the semester in groups of 1-3 students. The goal is to demonstrate your understanding of the concepts we’ve learned over the semester and to have some fun exploring a computer science project of your choice!

Timeline

  • Project proposals due Friday, Nov 13 at 9pm
    • You can work with anyone you’d like, including your previous project partners.
  • Project document due Monday, Dec 7 at 9pm
    • You’ll also hand in your code at this time.
  • Project presentations on Monday, Dec 7
    • These are mandatory for groups of 2-3 and optional for students doing the project individually.

Project proposal

The project proposals should be short: around two paragraphs. You should discuss in detail what you will try to do. If there are things you’d like to do but are not sure you’ll be able to finish, mention this. You’ll also tell us who you’ll be working with; each group should hand in only one project proposal.

The course staff will look over your project proposals to make sure the course projects are acceptable and doable. We may ask to meet with you if we have questions about a potential project. You should also feel free to reach out either before or after submitting the proposal!

Project document

The project document should be around 2-5 pages long. It should present a complete summary of your project–we’ll ask you to hand in your code as well, but it should be possible to understand what you’ve done based solely on the document. It should include at least the following sections:

Goal
This section should describe, in your own words, the goal of your project. What are you trying to accomplish? This section can include things you wanted to do but weren’t able to complete.
Implementation
This section should describe the actual implementation of your project. What design decisions did you make? What data structures did you use? What functions, classes, and methods did you develop?
Results
This section should describe the results of your hard work. Include any measurements you did, example program inputs and outputs, or screenshots. This is also the place to tell us about testing: how did you ensure that your code is correct?
Challenges
Describe any challenges you faced. What was difficult? Did you have any false starts in implementing your project? If you had goals that you weren’t able to accomplish, tell us why!
Future work
If you had as much time as you wanted to keep working on this project, what would you try to do? Be creative!

Project presentation

We will do project presentations over Zoom. Doing a presentation is mandatory for groups of 2-3 and optional for those doing the project individually. The presentations will be informal (i.e., don’t make slides)–you will just be walking through your project with Doug (and possibly one or more TAs). All group members should be prepared to answer questions about any aspect of the project. Presentations will be scheduled for 15 minutes each. We will schedule as many presentations as possible for our scheduled final exam slot (9am-12pm on May 11), but some presentations may end up at a different time on that day.

Project suggestions

We have come up with a number of suggested projects. You can implement one of these ideas as written, implement a modified version, or work on some other project of your choice. Regardless, we recommend looking over the suggestions to get a sense of how to scope your project.

We’ve included a suggested group size range for each project. We expect more from larger groups. If you work in a group of 3 on a project designed for 1-2 people, you should extend it in some way; conversely, if you work alone on a project designed for a group of 2-3, you should feel free to scale it back somewhat.

All of our suggestions are extensions or modifications to one of the three other course projects.

Project 1 extensions

A novel dataset (1 person)

Use your classifier from project 1 with a new dataset of your choosing (one from another class, or collected via a web API). The bulk of this project will likely consist of reformatting the data so that you can use it with your classifier.

Testing project 1 (2 people)

In Project 1 you built a genre classifier, but didn’t evaluate its effectiveness. For classifiers like the one you built in Project 1, a standard way to evaluate accuracy is to divide your data into a “training set” and a “test set.” Instead of computing your tf-idf model from all of your data, you’d compute it from a subset of your data–say, 80% of the songs; this is called the training set. You’d then measure its accuracy on the remaining 20% of songs; this is the test set. You’d need to calculate how often it identifies the correct answer, and compare your results against a reasonable baseline (say, always guessing the most common genre).

Improving project 1 (2 people)

Implement a couple of improvements to the Project 1 classifier. Some potential examples:

  • The classifier in project 1 is built on individual words. You could instead train the classifier on bigrams (pairs of words) to use common phrases across genres to classify songs.
  • The classifier in project one treats (for instance) “dog” and “dogs” as different words. Getting the string “dog” from the string “dogs” is called stemming, and is a basic operation in natural language processing systems. Implement a basic stemmer and use it in the Project 1 classifier.

A group of three could implement an improvement or two and then use the techniques described in the previous section to see if it actually makes the classifier better.

Project 2 extensions

Local temperatures (1 person)

In the Project 2 climate simulation, each country’s temperature was a simple function of the total BaD in the atmosphere. For this project, you could add a second type of pollution that affects polluting countries more than other countries. You’d need to track each country’s emissions separately and calculate each country’s temperature based on both the global BaD total and the local and global totals of this new molecule. You should implement a new policy or two that interacts sensibly with this new molecule.

A better map (2-3 people)

Project 2 simulates the climate on a 1-dimensional planet. For this project, you could extend the map to two dimensions. You’d need to adjust the display to show countries on a more complex map, and you’d need to adjust the simulator to handle neighbors in different directions.

Project 3 extensions

Project 3, which hasn’t been released yet, involves web scraping (getting the contents of a set of web pages into a program for analysis) and data analysis.

Scraping a different site (1-3 people)

In project 3, you extract data from Craigslist. Choose another site with interesting data that’s otherwise difficult to access and write a program to extract those data. Depending on the complexity of the site, this project could be appropriate for different group sizes; a group of three students could consider combining data scraped from multiple websites.

Something completely different

Do you have a dataset that can be represented as a tree, maybe from your other coursework or independent research? Could you simulate some natural phenomenon in code? Do you have a hobby that could benefit from a particular computer program? We highly encourage final projects related to your own interests–if it involves writing a program to solve a problem, it’s probably a great project!