UNDER CONSTRUCTION ! Suggestions welcome!
Sample past projects
These are some examples of final reports of past projects in this class, that also happen to be relevant to the current edition. They are good examples of scope.- A Flexible Distributed Runtime System
- Parallel Heuristics for TSP on MapReduce
- MRJS: A JavaScript MapReduce Framework for Web Browsers
Datasets
- Twitter Social Graph
- CMU ClueWeb09 A crawl from CMU of 1B pages. This is a link to the graph only, as the entire dataset is 5TB compressed, and has to be purchased as a set of disks.
- Million Song Database
- Amazon's Public Datasets A set of really intersting public datasets in Amazon's AWS.
Tutorials
Frameworks
Bibliography
Here are some online bibliographies related to MapReduce:- Alex, from Columbia, has a bibliography of MapReduce-related papers.
- Atbrox has a list of academic papers "about how the mapreduce parallel model and hadoop implementation is used to solve algorithmic problems".
- Abhishek Tiwari has a list of papers using MapReduce and Hadoop for computational biology.
- Map Reduce Applications Group at Mendeley, started by Jeff Hammerbacher, from Cloudera.
- MAPREDUCE 2011 - Second International Workshop on MapReduce and its Applications
- MAPREDUCE 2010 - First International Workshop on MapReduce and its Applications
- NSF CLuE PI Meeting, October 2009. Several interesting presentations on work done within the scope of the NSF Cluster Exploratory Program.
Related Courses
- Data-Intensive Information Processing Applications (Spring 2010) Jimmy Lin, University of Maryland
- Cloud computing: Systems, Networking, and Frameworks, Ion Stoica, Berkeley, Fall 2011
- Massively Parallel Data Analysis with MapReduce, Gustavo Alonso, Donald Kossmann, Timothy Roscoe, Nesime Tatbul, ETH Zürich