Course format

In each class we will discuss 1-2 research papers. Students are expected to read the assigned papers and write a short review before each class (more on paper reviews below). One student will do a short presentation about each paper for the day, which will be the starting point for our discussions.

In parallel, students will work on a semester-long project on an open research problem related to the topics covered in the course. Projects can have a systems or application focus, or both, and projects relating to the students' own research interest are strongly encouraged, provided they also fit with the theme of the class.

Required reading

There are no textbooks for this course. We will read the papers in the schedule, which are available electronically. While we won't cover this directly in lectures, a good starting point for working with Hadoop is Tom White's book, Hadoop: The Definitive Guide, 2nd Edition".


  • 10% Participation: Class attendance and in-class discussions
  • 25% Paper reviews, discussion lead
    • 15% Paper reviews
    • 10% Discussion lead
  • 15% Programming Assignments
  • 50% Project
    • 10% Initial proposal
    • 10% Midterm progress report
    • 10% Presentation
    • 20% Final report


Paper reviews

Paper reviews shall be posted to the class discussion group no later than 11:59pm on the day before the corresponding class. Please post reviews in plain text, no html, and no attachments (doc, pdf, etc). You are excused of 4 reviews with no questions asked.

Use this template when writing your reviews. Remember, you are not simply summarizing the paper, but providing an assessment. Here are tips for becoming a more efficient reader.

Paper discussions

Each student will lead the discussion on a few of the papers during the semester. The exact allocation will depend on the number of students and student interest.

You must consult with me before the class in which you will lead a discussion. You don't have to post a review before class when you present, but should post a summary of the discussion after the class. These can be your notes or presentation, plus any additional interesting insight we got at the class.

For guidelines on how to prepare your discussion, check prof. Randy Katz' notes [pdf]. You don't have to have slides, as long as you have some notes to guide you.


We will have 3 or 4 programming assignments to get your hands dirty. More details to follow.


The other major component of the class is a research project on a subject of the student's choosing. Ideally the project should be carried out in groups of 2, but exceptions (1 or 3 students) are possible. Please consult with me.

The projects should aim high, such that it should be possible to submit the best projects to a conference or workshop with some additional work after the semester. (I'd be happy to help with that.)

Project proposals are the first milestone, and should be short and to the point. This document has a good description of the questions your proposal should answer. The proposal should be from 1 to 4 pages, in pdf, and sent by email to the instructor.

Before that, on Friday 9/30, you should post a draft of your proposal to the newsgroup. The goal of this is to find potential matches among the students for forming groups.


  1. 9/30, Fri, 11:59pm. Project proposal draft due, posted to newsgroup.
  2. 10/04, Tue. Proposal discussion: In-class discussion of project proposals.
  3. 10/7, Fri, 11:59pm. Project proposal due. 1 to 4 pages pdf.
  4. 11/10, Thu. Progress report: In-class presentations of progress in the project.
  5. 12/13, Tue (in CIT 368) Final presentations: open presentation session with refreshments. Counts towards the presentation grade.
  6. 12/17, Sat , 11:59pm. Final report: Written project reports due. See below.

Final Presentation

We will have a presentation session. If the class prefers, we can alternatively have a poster session. The current plan is for an open presentation session on Thursday, December 9th, from 1pm to 2:30pm, in room 368 of the CIT Building. Each group will have 8 minutes to present, at most. We have to make sure that transitions are quick.

Some hints:

  • Not more than 8 slides!
    • Brief intro to the problem
    • What you did, questions asked
    • Experiments
    • Conclusion: what you learned
  • Practice at least once!!!

Final Project Report

The final report is due on Saturday, December 17th, at 11:59pm. It MUST be a PDF file, sent by email to me. It should have between 8 and 12 single-spaced pages with font not larger than 11pt. I suggest you use LaTeX, but that's up to you.

Contents should be similar to a research paper:

  • Introduction and motivation
  • What you did
  • Experiments/Evaluation
  • Related work and how you extend or are different
  • Conclusion: what you learned
  • Proper references (BibTeX makes you life easy)
  • Graphs should be legible! (hint: avoid small fonts for axis and use vector - .eps, .pdf, .svg - rather than bitmap - .png, jpg - whenever possible)

I have created a sample LaTeX file that you can use to write your report if you want. This is what is should look like once you compile it, very clean and professional-looking :)

If you have any questions before submitting, don't hesitate to ask!

Course Policies


Depending on your project this course can be used as a Capstone course for undergraduates. Talk to the instructor before assuming that, though!

Your work

Apart from the final project, all work in this course is to be done individually, and the usual code of conduct applies. Remember, this is a graduate course, and I assume you are here because you want to learn how to do research. We will have zero tolerance for academic misconduct.

Late Policy

Everyone is allowed a total of 4 missing paper reviews and 3 late days (total) for the assignments.

Incomplete Policy

Incompletes are granted only under exceptional circumstances (e.g. severe illness, death in the family, kidnapping, etc.; too heavy of a course load is not sufficient reason for an incomplete). Getting a dean to certify your reason for requesting an incomplete helps, but is not sufficient.