The schedule of classes is still being finalized, watch this space!
The bottom of the page has a nicely compiled list of the readings in the schedule
Schedule
Date | Lecture | Notes | Readings |
---|---|---|---|
Thu 01/25 | Introduction. |
Sign up for Piazza and for the paper review site. | keshav07howtoread, hanson00 |
Tue 01/30 | Warm-up |
Read both, but review only required for Pywren. | armbrust10cloud, jonas17pywren |
Thu 02/01 | More uses for serverless |
fouladi17excamera | |
Thu 02/08 | The Datacenter as a Computer |
No review needed. Read chapters 1 and 2 up to 2.4. | barroso09warehouse |
Tue 02/13 | Storage |
Review required only for GFS03, skim GFSEvolution | GFS03, GFSEvolution |
Thu 02/15 | Storage |
Flat Datacenter Storage | nightingale12fds |
Tue 02/20 | |
Long weekend | |
Thu 02/22 | Datacenter Networking |
Related papers - FatTree and PortLand (not required) | Greenberg09VL2 |
Tue 02/27 | |
Evolution of Google's datacenter network. Also see Facebook's datacenter fabric. | Singh15Jupiter |
Thu 03/01 | SDN |
No review necessary | Casado14SDNAbstractions, Feamster13road |
Tue 03/06 | Execution Models - MapReduce and Friends |
Review only for MapReduce, but also read the Spark paper | MapReduce04, zaharia16spark |
Thu 03/08 | Execution Models - Parameter Server |
This paper was presented at OSDI'14 at a session with GraphX and Project Adam, also very interesting. | li14parameter |
Tue 03/13 | Snow day |
||
Thu 03/15 | Execution Models - Timely Dataflow |
For a nice introduction to the paper, take a look here. A companion to this paper that describes the computational model is differential dataflow. | murray13naiad |
Tue 03/20 | Execution Models - Tensor Flow |
abadi16tensorflow | |
Thu 03/22 | Making sense of performance - Monotasks |
ousterhout17monotasks | |
Tue 03/27 | Spring Break |
||
Thu 03/29 | Spring Break |
||
Tue 04/03 | Resource Disaggregation |
gao16disag | |
Thu 04/05 | Virtualization |
manco17lightvm | |
Tue 04/10 | Tracing Distributed Systems |
Jonathan Mace will lead the discussion on tracing | mace15pivot |
Thu 04/12 | Orchestration |
You can also read Borg, Omega, Kubernetes (Google's take, no review) | hindman15mesos |
Tue 04/17 | Design Patterns |
Three readings, no reviews. The Tail at Scale and Design patterns for container-based distributed systems and the Concepts section of the Istio docs. Answer the question on Piazza. | |
Thu 04/19 | Security |
ristenpart09offmycloud | |
Tue 05/01 | The past's future. Last class! |
Read the Rise of RaaS, and answer the 2 questions in Piazza. |
Reading List
-
Thu 01/25 -- Lecture 1: Introduction.
-
Keshav, S.
How to read a paper
In SIGCOMM Comput. Commun. Rev., 2007, pages 83--84
-
Michael J. Hanson and Dylan J. McNamee
Efficient Reading of Papers in Science and Technology
-
Keshav, S.
-
Tue 01/30 -- Lecture 2: Warm-up
-
Armbrust, Michael and Fox, Armando and Griffith, Rean and Joseph, Anthony D. and Katz, Randy and Konwinski, Andy and Lee, Gunho and Patterson, David and Rabkin, Ariel and Stoica, Ion and Zaharia, Matei
A View of Cloud Computing
In Commun. ACM, Apr 2010, pages 50--58
-
Jonas, Eric and Pu, Qifan and Venkataraman, Shivaram and Stoica, Ion and Recht, Benjamin
Occupy the Cloud: Distributed Computing for the 99\%
In Proceedings of the 2017 Symposium on Cloud Computing, 2017, pages 445--451
-
Armbrust, Michael and Fox, Armando and Griffith, Rean and Joseph, Anthony D. and Katz, Randy and Konwinski, Andy and Lee, Gunho and Patterson, David and Rabkin, Ariel and Stoica, Ion and Zaharia, Matei
-
Thu 02/01 -- Lecture 3: More uses for serverless
-
Sadjad Fouladi and Riad S. Wahby and Brennan Shacklett and Karthikeyan Vasuki Balasubramaniam and William Zeng and Rahul Bhalerao and Anirudh Sivaraman and George Porter and Keith Winstein
Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads
In Proceedings of the 14th ACM/USENIX Symposium on Networked Systems Design and Implementation (NSDI), March 2017
-
Sadjad Fouladi and Riad S. Wahby and Brennan Shacklett and Karthikeyan Vasuki Balasubramaniam and William Zeng and Rahul Bhalerao and Anirudh Sivaraman and George Porter and Keith Winstein
-
Thu 02/08 -- Lecture 4: The Datacenter as a Computer
-
Luiz Andr\'e Barroso and Urs H\olzle
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
-
Luiz Andr\'e Barroso and Urs H\olzle
-
Tue 02/13 -- Lecture 5: Storage
-
Sanjay Ghemawat and Howard Gobioff and Shun-Tak Leung
The Google file system
In SOSP '03: Proceedings of the nineteenth ACM symposium on Operating systems principles, 2003, pages 29--43
-
McKusick, Marshall Kirk and Quinlan, Sean
GFS: Evolution on Fast-forward
In Queue, August 2009, pages 10:10--10:20
-
Sanjay Ghemawat and Howard Gobioff and Shun-Tak Leung
-
Thu 02/15 -- Lecture 6: Storage
-
Edmund B. Nightingale and Jeremy Elson and Jinliang Fan and Owen Hofmann and Jon Howell and Yutaka Suzue
Flat Datacenter Storage
In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), 2012, pages 1--15
-
Edmund B. Nightingale and Jeremy Elson and Jinliang Fan and Owen Hofmann and Jon Howell and Yutaka Suzue
-
Thu 02/22 -- Lecture 7: Datacenter Networking
-
Greenberg, Albert and Hamilton, James R. and Jain, Navendu and Kandula, Srikanth and Kim, Changhoon and Lahiri, Parantap and Maltz, David A. and Patel, Parveen and Sengupta, Sudipta
VL2: a scalable and flexible data center network
In SIGCOMM '09: Proceedings of the ACM SIGCOMM 2009 conference on Data communication, 2009, pages 51--62
-
Greenberg, Albert and Hamilton, James R. and Jain, Navendu and Kandula, Srikanth and Kim, Changhoon and Lahiri, Parantap and Maltz, David A. and Patel, Parveen and Sengupta, Sudipta
-
Tue 02/27 -- Lecture 8:
-
Arjun Singh and Joon Ong and Amit Agarwal and Glen Anderson and Ashby Armistead and Roy Bannon and Seb Boving and Gaurav Desai and Bob Felderman and Paulie Germano and Anand Kanagala and Jeff Provost and Jason Simmons and Eiichi Tanda and Jim Wanderer and Urs H\olzle and Stephen Stuart and Amin Vahdat
Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network
In Sigcomm '15, 2015
-
Arjun Singh and Joon Ong and Amit Agarwal and Glen Anderson and Ashby Armistead and Roy Bannon and Seb Boving and Gaurav Desai and Bob Felderman and Paulie Germano and Anand Kanagala and Jeff Provost and Jason Simmons and Eiichi Tanda and Jim Wanderer and Urs H\olzle and Stephen Stuart and Amin Vahdat
-
Thu 03/01 -- Lecture 9: SDN
-
Casado, Martin and Foster, Nate and Guha, Arjun
Abstractions for Software-defined Networks
In Commun. ACM, Sep 2014, pages 86--95
-
Feamster, Nick and Rexford, Jennifer and Zegura, Ellen
The Road to SDN
In Queue, Dec 2013, pages 20:20--20:40
-
Casado, Martin and Foster, Nate and Guha, Arjun
-
Tue 03/06 -- Lecture 10: Execution Models - MapReduce and Friends
-
Dean, Jeffrey and Ghemawat, Sanjay
MapReduce: simplified data processing on large clusters
In OSDI'04: Proceedings of the 6th conference on Symposium on Opearting Systems Design \& Implementation, 2004, pages 137-150
-
Zaharia, Matei and Xin, Reynold S. and Wendell, Patrick and Das, Tathagata and Armbrust, Michael and Dave, Ankur and Meng, Xiangrui and Rosen, Josh and Venkataraman, Shivaram and Franklin, Michael J. and Ghodsi, Ali and Gonzalez, Joseph and Shenker, Scott and Stoica, Ion
Apache Spark: A Unified Engine for Big Data Processing
In Commun. ACM, Oct 2016, pages 56--65
-
Dean, Jeffrey and Ghemawat, Sanjay
-
Thu 03/08 -- Lecture 11: Execution Models - Parameter Server
-
Mu Li and David G. Andersen and Jun Woo Park and Alexander J. Smola and Amr Ahmed and Vanja Josifovski and James Long and Eugene J. Shekita and Bor-Yiing Su
Scaling Distributed Machine Learning with the Parameter Server
In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), 2014, pages 583--598
-
Mu Li and David G. Andersen and Jun Woo Park and Alexander J. Smola and Amr Ahmed and Vanja Josifovski and James Long and Eugene J. Shekita and Bor-Yiing Su
-
Thu 03/15 -- Lecture 12: Execution Models - Timely Dataflow
-
Murray, Derek G. and McSherry, Frank and Isaacs, Rebecca and Isard, Michael and Barham, Paul and Abadi, Mart\'\in
Naiad: A Timely Dataflow System
In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, 2013, pages 439--455
-
Murray, Derek G. and McSherry, Frank and Isaacs, Rebecca and Isard, Michael and Barham, Paul and Abadi, Mart\'\in
-
Tue 03/20 -- Lecture 13: Execution Models - Tensor Flow
-
Mart\'\in Abadi and Paul Barham and Jianmin Chen and Zhifeng Chen and Andy Davis and Jeffrey Dean and Matthieu Devin and Sanjay Ghemawat and Geoffrey Irving and Michael Isard and Manjunath Kudlur and Josh Levenberg and Rajat Monga and Sherry Moore and Derek G. Murray and Benoit Steiner and Paul Tucker and Vijay Vasudevan and Pete Warden and Martin Wicke and Yuan Yu and Xiaoqiang Zheng
TensorFlow: A System for Large-Scale Machine Learning
In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 2016, pages 265--283
-
Mart\'\in Abadi and Paul Barham and Jianmin Chen and Zhifeng Chen and Andy Davis and Jeffrey Dean and Matthieu Devin and Sanjay Ghemawat and Geoffrey Irving and Michael Isard and Manjunath Kudlur and Josh Levenberg and Rajat Monga and Sherry Moore and Derek G. Murray and Benoit Steiner and Paul Tucker and Vijay Vasudevan and Pete Warden and Martin Wicke and Yuan Yu and Xiaoqiang Zheng
-
Thu 03/22 -- Lecture 14: Making sense of performance - Monotasks
-
Ousterhout, Kay and Canel, Christopher and Ratnasamy, Sylvia and Shenker, Scott
Monotasks: Architecting for Performance Clarity in Data Analytics Frameworks
In Proceedings of the 26th Symposium on Operating Systems Principles, 2017, pages 184--200
-
Ousterhout, Kay and Canel, Christopher and Ratnasamy, Sylvia and Shenker, Scott
-
Tue 04/03 -- Lecture 15: Resource Disaggregation
-
Peter X. Gao and Akshay Narayan and Sagar Karandikar and Joao Carreira and Sangjin Han and Rachit Agarwal and Sylvia Ratnasamy and Scott Shenker
Network Requirements for Resource Disaggregation
In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 2016, pages 249--264
-
Peter X. Gao and Akshay Narayan and Sagar Karandikar and Joao Carreira and Sangjin Han and Rachit Agarwal and Sylvia Ratnasamy and Scott Shenker
-
Thu 04/05 -- Lecture 16: Virtualization
-
Manco, Filipe and Lupu, Costin and Schmidt, Florian and Mendes, Jose and Kuenzer, Simon and Sati, Sumit and Yasukata, Kenichi and Raiciu, Costin and Huici, Felipe
My VM is Lighter (and Safer) Than Your Container
In Proceedings of the 26th Symposium on Operating Systems Principles, 2017, pages 218--233
-
Manco, Filipe and Lupu, Costin and Schmidt, Florian and Mendes, Jose and Kuenzer, Simon and Sati, Sumit and Yasukata, Kenichi and Raiciu, Costin and Huici, Felipe
-
Tue 04/10 -- Lecture 17: Tracing Distributed Systems
-
Jonathan Mace and Ryan Roelke and Rodrigo Fonseca
Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems
In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), October 2015
-
Jonathan Mace and Ryan Roelke and Rodrigo Fonseca
-
Thu 04/12 -- Lecture 18: Orchestration
-
B. Hindman and A. Konwinski and M. Zaharia and A. Ghodsi and A.D. Joseph and R. Katz and S. Shenker and I. Stoica
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
In NSDI 2011, March 2011
-
B. Hindman and A. Konwinski and M. Zaharia and A. Ghodsi and A.D. Joseph and R. Katz and S. Shenker and I. Stoica
-
Thu 04/19 -- Lecture 20: Security
-
Ristenpart, Thomas and Tromer, Eran and Shacham, Hovav and Savage, Stefan
Hey, You, Get off of My Cloud: Exploring Information Leakage in Third-party Compute Clouds
In Proceedings of the 16th ACM Conference on Computer and Communications Security, 2009, pages 199--212
-
Ristenpart, Thomas and Tromer, Eran and Shacham, Hovav and Savage, Stefan