The schedule of classes is still being finalized, watch this space!
Also available as an ical file that you can subscribe to.
The bottom of the page has a nicely compiled list of the readings in the schedule!
Schedule
Date | Lecture | Notes | Readings |
---|---|---|---|
Thu 09/08 | Intro [pdf] [pptx] |
Skim Lin and Dyer Ch 1, Barroso and Hölze, Ch 1,3. This is related to what we read in class. | LinBook, BarrosoBook |
Tue 09/13 | (S) MapReduce |
No reviews due, but read the paper carefully. | MapReduce04 |
Thu 09/15 | (S) Google File System |
Review for GFS only. Hammurabi presents. | GFS03 [Hammurabi], GFSEvolution |
Tue 09/20 | (A) MapReduce Algorithmic Strategies I |
Read only Sections 3.2, 3.3 from Lin's Book. Review required for Kiefer | LinBook 3.2, 3.3 Kiefer10Pairwise |
Graph Processing | |||
Thu 09/22 | (A) Graph Processing with MapReduce |
Review required. Content is similar to Chapter 5 of Lin's book | Lin10Graph |
Tue 09/27 | (A) Graph Algorithms in MapReduce |
Guest lecture Matteo Riondatto. No reviews for Cohen09Graph, optional for Pegasus. | Kang09Pegasus, Cohen09Graph |
Thu 09/29 | (S) Alternatives for Graph Processing |
Review required | Malewicz10Pregel [Marcelo] |
Fri 09/30 | Draft Project Proposals Due, 11:59pm |
||
Tue 10/04 | In-class Project Proposal Discussion |
||
Thu 10/06 | Data Warehousing on MapReduce |
Guest Lecture, Ali Dasdan, Turn, Inc. Review Required only for Cheetah | Chen10Cheetah [Review] |
Fri 10/07 | Project Proposals Due, 11:59pm |
||
Machine Learning | |||
Tue 10/11 | Machine Learning I |
Review required for Chu06MLMulticore | Chu06MLMulticore, Ghoting11SystemML [Optional] |
Thu 10/13 | Machine Learning II - Spark |
Review required | Zaharia11Spark [Alex] |
Tue 10/18 | (S) GraphLab |
Review required. | Low11GraphLabTR [Jonathan] |
Thu 10/20 | (S) General Framework - Dryad |
Review required | Isard07Dryad |
Tue 10/25 | No class, Rodrigo away |
||
Thu 10/27 | Guest lecture - Andrew Ferguson |
||
Computational Biology | |||
Tue 11/01 | Comp Bio I |
Review for Cloudburst | Taylor10Overview, Schatz09Cloudburst [Max] |
Thu 11/03 | Comp Bio II |
Simpler review for both, check email | Menon11Genome [Feng], Jackson10DeNovoMPI [Peter] |
Tue 11/08 | Balanced Systems |
(Normal) Review Required | TritonSort |
Thu 11/10 | In-class Project Progress Report |
||
Tue 11/15 | MapReduce on GPUs |
Review required for Mars. Skim MultiGPU, focus on what it improves over Mars. | Mars [Aditya], MultiGPU [Optional] |
Thu 11/17 | Piccolo |
Review required | Piccolo [Andrew] |
Tue 11/22 | Stragglers and Hotspots |
Rodrigo away. Marcelo will present an overview of two techniques to cope with imbalance. | Mantri, Scarlett |
Fri 11/25 | Thanksgiving |
||
Tue 11/29 | High-level Languages |
Both reviews required. | Hive [Qun], Pig [Marcus] |
Thu 12/01 | Sharing |
Review required. | Mesos [Li] |
Tue 12/06 | Workflow |
Last class! | FlumeJava [James] |
Thu 12/08 | Reading Period |
||
Tue 12/13 | Project Presentations |
368 CIT, time TBD | |
Sat 12/17 | No Class |
Final Project Reports Due, 11:59pm |
Reading List
-
Thu 09/08 -- Lecture 1: Intro
-
Jimmy Lin and Chris Dyer
Data-Intensive Text Processing with MapReduce
In Synthesis Lectures on Human Language Technologies, 2010, pages 1--177
-
Luiz Andr\'e Barroso and Urs H\olzle
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
-
Jimmy Lin and Chris Dyer
-
Tue 09/13 -- Lecture 2: (S) MapReduce
-
Dean, Jeffrey and Ghemawat, Sanjay
MapReduce: simplified data processing on large clusters
In OSDI'04: Proceedings of the 6th conference on Symposium on Opearting Systems Design \& Implementation, 2004, pages 137-150
-
Dean, Jeffrey and Ghemawat, Sanjay
-
Thu 09/15 -- Lecture 3: (S) Google File System
-
Sanjay Ghemawat and Howard Gobioff and Shun-Tak Leung
The Google file system
In SOSP '03: Proceedings of the nineteenth ACM symposium on Operating systems principles, 2003, pages 29--43
-
McKusick, Marshall Kirk and Quinlan, Sean
GFS: Evolution on Fast-forward
In Queue, August 2009, pages 10:10--10:20
-
Sanjay Ghemawat and Howard Gobioff and Shun-Tak Leung
-
Tue 09/20 -- Lecture 4: (A) MapReduce Algorithmic Strategies I
-
Kiefer, Tim and Volk, Peter Benjamin and Lehner, Wolfgang
Pairwise Element Computation with MapReduce
In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010, pages 826--833
-
Kiefer, Tim and Volk, Peter Benjamin and Lehner, Wolfgang
-
Thu 09/22 -- Lecture 5: (A) Graph Processing with MapReduce
-
Jimmy Lin and Michael Schatz
Design Patterns for Efficient Graph Algorithms in MapReduce
In Proceedings of the 2010 Workshop on Mining and Learning with Graphs Workshop (MLG-2010), July 2010
-
Jimmy Lin and Michael Schatz
-
Tue 09/27 -- Lecture 6: (A) Graph Algorithms in MapReduce
-
U Kang and Charalampos E. Tsourakakis and Christos Faloutsos
PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations.
In IEEE International Conference On Data Mining, 2009
-
Cohen, Jonathan
Graph Twiddling in a MapReduce World
In Computing in Science and Engg., July 2009, pages 29--41
-
U Kang and Charalampos E. Tsourakakis and Christos Faloutsos
-
Thu 09/29 -- Lecture 7: (S) Alternatives for Graph Processing
-
Malewicz, Grzegorz and Austern, Matthew H. and Bik, Aart J.C and Dehnert, James C. and Horn, Ilan and Leiser, Naty and Czajkowski, Grzegorz
Pregel: a system for large-scale graph processing
In SIGMOD '10: Proceedings of the 2010 international conference on Management of data, 2010, pages 135--146
-
Malewicz, Grzegorz and Austern, Matthew H. and Bik, Aart J.C and Dehnert, James C. and Horn, Ilan and Leiser, Naty and Czajkowski, Grzegorz
-
Thu 10/06 -- Lecture 9: Data Warehousing on MapReduce
-
Chen, Songting
Cheetah: a high performance, custom data warehouse on top of MapReduce
In Proc. VLDB Endow., September 2010, pages 1459--1468
-
Chen, Songting
-
Tue 10/11 -- Lecture 10: Machine Learning I
-
Chu, Cheng T. and Kim, Sang K. and Lin, Yi A. and Yu, Yuanyuan and Bradski, Gary R. and Ng, Andrew Y. and Olukotun, Kunle
Map-Reduce for Machine Learning on Multicore
In Proceedings of NIPS'06, 2006, pages 281--288
-
Ghoting, A. and Krishnamurthy, R. and Pednault, E. and Reinwald, B. and Sindhwani, V. and Tatikonda, S. and Yuanyuan Tian and Vaithyanathan, S.
SystemML: Declarative machine learning on MapReduce
In Data Engineering (ICDE), 2011 IEEE 27th International Conference on, April 2011, pages 231 -242
-
Chu, Cheng T. and Kim, Sang K. and Lin, Yi A. and Yu, Yuanyuan and Bradski, Gary R. and Ng, Andrew Y. and Olukotun, Kunle
-
Thu 10/13 -- Lecture 11: Machine Learning II - Spark
-
Zaharia, Matei and Chowdhury, Mosharaf and Das, Tathagata and Dave, Ankur and Ma, Justin and McCauley, Murphy and Franklin, Michael and Shenker, Scott and Stoica, Ion
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
-
Zaharia, Matei and Chowdhury, Mosharaf and Das, Tathagata and Dave, Ankur and Ma, Justin and McCauley, Murphy and Franklin, Michael and Shenker, Scott and Stoica, Ion
-
Tue 10/18 -- Lecture 12: (S) GraphLab
-
Yucheng Low and Joseph Gonzalez and Aapo Kyrola and Danny Bickson and Carlos Guestrin
GraphLab: A Distributed Framework for Machine Learning in the Cloud
-
Yucheng Low and Joseph Gonzalez and Aapo Kyrola and Danny Bickson and Carlos Guestrin
-
Thu 10/20 -- Lecture 13: (S) General Framework - Dryad
-
Michael Isard and Mihai Budiu and Yuan Yu and Andrew Birrell and Dennis Fetterly
Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
In Proceedings of the European Conference on Computer Systems (EuroSys), March 2007
-
Michael Isard and Mihai Budiu and Yuan Yu and Andrew Birrell and Dennis Fetterly
-
Tue 11/01 -- Lecture 14: Comp Bio I
-
Ronald C Taylor
An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics
In BMC Bioinformatics, December 2010
-
Michael C. Schatz
CloudBurst: highly sensitive read mapping with MapReduce
In Bioinformatics, 2009, pages 1363--1369
-
Ronald C Taylor
-
Thu 11/03 -- Lecture 15: Comp Bio II
-
Menon, Rohith K. and Bhat, Goutham P. and Schatz, Michael C.
Rapid parallel genome indexing with MapReduce
In Proceedings of the second international workshop on MapReduce and its applications, 2011, pages 51--58
-
Jackson, B.G. and Regennitter, M. and Yang, X. and Schnable, P.S. and Aluru, S.
Parallel de novo assembly of large genomes from high-throughput short reads
In Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, April 2010, pages 1 -10
-
Menon, Rohith K. and Bhat, Goutham P. and Schatz, Michael C.
-
Tue 11/08 -- Lecture 16: Balanced Systems
-
Alexander Rasmussen and George Porter and Michael Conley and Harsha Madhyastha and Radhika Niranjan Mysore and Alexander Pucher and Amin Vahdat
TritonSort: A Balanced Large-Scale Sorting System
In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2011), 2011
-
Alexander Rasmussen and George Porter and Michael Conley and Harsha Madhyastha and Radhika Niranjan Mysore and Alexander Pucher and Amin Vahdat
-
Tue 11/15 -- Lecture 18: MapReduce on GPUs
-
He, Bingsheng and Fang, Wenbin and Luo, Qiong and Govindaraju, Naga K. and Wang, Tuyong
Mars: a MapReduce framework on graphics processors
In PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, 2008, pages 260--269
-
Jeff A. Stuart and John D. Owens
Multi-GPU MapReduce on GPU Clusters
In Proceedings of the 25th IEEE International Parallel and Distributed Processing Symposium - IPDPS, 2011
-
He, Bingsheng and Fang, Wenbin and Luo, Qiong and Govindaraju, Naga K. and Wang, Tuyong
-
Thu 11/17 -- Lecture 19: Piccolo
-
Russell Power and Jinyang Li
Piccolo: Building Fast, Distributed Programs with Partitioned Tables
In 9th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2010, October 2010, pages 293-306
-
Russell Power and Jinyang Li
-
Tue 11/22 -- Lecture 20: Stragglers and Hotspots
-
Ganesh Ananthanarayanan and Srikanth Kandula and Albert G. Greenberg and Ion Stoica and Yi Lu and Bikas Saha and Edward Harris
Reining in the Outliers in Map-Reduce Clusters using Mantri
In Proc. 9th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2010, 2010, pages 265-278
-
Ganesh Ananthanarayanan and Sameer Agarwal and Srikanth Kandula and Albert G. Greenberg and Ion Stoica and Duke Harlan and Ed Harris
Scarlett: coping with skewed content popularity in mapreduce clusters.
In EuroSys'11, 2011, pages 287-300
-
Ganesh Ananthanarayanan and Srikanth Kandula and Albert G. Greenberg and Ion Stoica and Yi Lu and Bikas Saha and Edward Harris
-
Tue 11/29 -- Lecture 21: High-level Languages
-
Thusoo, A. and Sarma, J.S. and Jain, N. and Zheng Shao and Chakka, P. and Ning Zhang and Antony, S. and Hao Liu and Murthy, R.
Hive - a petabyte scale data warehouse using Hadoop
In Data Engineering (ICDE), 2010 IEEE 26th International Conference on, March 2010, pages 996 -1005
-
Olston, Christopher and Reed, Benjamin and Srivastava, Utkarsh and Kumar, Ravi and Tomkins, Andrew
PigLatin: a not-so-foreign language for data processing
In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 2008, pages 1099--1110
-
Thusoo, A. and Sarma, J.S. and Jain, N. and Zheng Shao and Chakka, P. and Ning Zhang and Antony, S. and Hao Liu and Murthy, R.
-
Thu 12/01 -- Lecture 22: Sharing
-
B. Hindman and A. Konwinski and M. Zaharia and A. Ghodsi and A.D. Joseph and R. Katz and S. Shenker and I. Stoica
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
In NSDI 2011, March 2011
-
B. Hindman and A. Konwinski and M. Zaharia and A. Ghodsi and A.D. Joseph and R. Katz and S. Shenker and I. Stoica
-
Tue 12/06 -- Lecture 23: Workflow
-
Chambers, Craig and Raniwala, Ashish and Perry, Frances and Adams, Stephen and Henry, Robert R. and Bradshaw, Robert and Weizenbaum, Nathan
FlumeJava: easy, efficient data-parallel pipelines
In Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation, 2010, pages 363--375
-
Chambers, Craig and Raniwala, Ashish and Perry, Frances and Adams, Stephen and Henry, Robert R. and Bradshaw, Robert and Weizenbaum, Nathan