The schedule of classes is still being finalized, watch this space!

The bottom of the page has a nicely compiled list of the readings in the schedule

Schedule

Date Lecture Notes Readings
Thu 01/25 Introduction.
Sign up for Piazza and for the paper review site. keshav07howtoread, hanson00
Tue 01/30 Warm-up
Read both, but review only required for Pywren. armbrust10cloud, jonas17pywren
Thu 02/01 More uses for serverless
fouladi17excamera
Thu 02/08 The Datacenter as a Computer
No review needed. Read chapters 1 and 2 up to 2.4. barroso09warehouse
Tue 02/13 Storage
Review required only for GFS03, skim GFSEvolution GFS03, GFSEvolution
Thu 02/15 Storage
Flat Datacenter Storage nightingale12fds
Tue 02/20
Long weekend
Thu 02/22 Datacenter Networking
Related papers - FatTree and PortLand (not required) Greenberg09VL2
Tue 02/27
Evolution of Google's datacenter network. Also see Facebook's datacenter fabric. Singh15Jupiter
Thu 03/01 SDN
No review necessary Casado14SDNAbstractions, Feamster13road
Tue 03/06 Execution Models - MapReduce and Friends
Review only for MapReduce, but also read the Spark paper MapReduce04, zaharia16spark
Thu 03/08 Execution Models - Parameter Server
This paper was presented at OSDI'14 at a session with GraphX and Project Adam, also very interesting. li14parameter
Tue 03/13 Snow day
Thu 03/15 Execution Models - Timely Dataflow
For a nice introduction to the paper, take a look here. A companion to this paper that describes the computational model is differential dataflow. murray13naiad
Tue 03/20 Execution Models - Tensor Flow
abadi16tensorflow
Thu 03/22 Making sense of performance - Monotasks
ousterhout17monotasks
Tue 03/27 Spring Break
Thu 03/29 Spring Break
Tue 04/03 Resource Disaggregation
gao16disag
Thu 04/05 Virtualization
manco17lightvm
Tue 04/10 Tracing Distributed Systems
Jonathan Mace will lead the discussion on tracing mace15pivot
Thu 04/12 Orchestration
You can also read Borg, Omega, Kubernetes (Google's take, no review) hindman15mesos
Tue 04/17 Design Patterns
Three readings, no reviews. The Tail at Scale and Design patterns for container-based distributed systems and the Concepts section of the Istio docs. Answer the question on Piazza.
Thu 04/19 Security
ristenpart09offmycloud
Tue 05/01 The past's future. Last class!
Read the Rise of RaaS, and answer the 2 questions in Piazza.

Reading List

  • Thu 01/25 -- Lecture 1: Introduction.
  • Tue 01/30 -- Lecture 2: Warm-up
    • Armbrust, Michael and Fox, Armando and Griffith, Rean and Joseph, Anthony D. and Katz, Randy and Konwinski, Andy and Lee, Gunho and Patterson, David and Rabkin, Ariel and Stoica, Ion and Zaharia, Matei
      A View of Cloud Computing
      In Commun. ACM, Apr 2010, pages 50--58
    • Jonas, Eric and Pu, Qifan and Venkataraman, Shivaram and Stoica, Ion and Recht, Benjamin
      Occupy the Cloud: Distributed Computing for the 99\%
      In Proceedings of the 2017 Symposium on Cloud Computing, 2017, pages 445--451
  • Thu 02/01 -- Lecture 3: More uses for serverless
  • Thu 02/08 -- Lecture 4: The Datacenter as a Computer
  • Tue 02/13 -- Lecture 5: Storage
    • Sanjay Ghemawat and Howard Gobioff and Shun-Tak Leung
      The Google file system
      In SOSP '03: Proceedings of the nineteenth ACM symposium on Operating systems principles, 2003, pages 29--43
    • McKusick, Marshall Kirk and Quinlan, Sean
      GFS: Evolution on Fast-forward
      In Queue, August 2009, pages 10:10--10:20
  • Thu 02/15 -- Lecture 6: Storage
    • Edmund B. Nightingale and Jeremy Elson and Jinliang Fan and Owen Hofmann and Jon Howell and Yutaka Suzue
      Flat Datacenter Storage
      In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), 2012, pages 1--15
  • Thu 02/22 -- Lecture 7: Datacenter Networking
    • Greenberg, Albert and Hamilton, James R. and Jain, Navendu and Kandula, Srikanth and Kim, Changhoon and Lahiri, Parantap and Maltz, David A. and Patel, Parveen and Sengupta, Sudipta
      VL2: a scalable and flexible data center network
      In SIGCOMM '09: Proceedings of the ACM SIGCOMM 2009 conference on Data communication, 2009, pages 51--62
  • Tue 02/27 -- Lecture 8:
  • Thu 03/01 -- Lecture 9: SDN
  • Tue 03/06 -- Lecture 10: Execution Models - MapReduce and Friends
    • Dean, Jeffrey and Ghemawat, Sanjay
      MapReduce: simplified data processing on large clusters
      In OSDI'04: Proceedings of the 6th conference on Symposium on Opearting Systems Design \& Implementation, 2004, pages 137-150
    • Zaharia, Matei and Xin, Reynold S. and Wendell, Patrick and Das, Tathagata and Armbrust, Michael and Dave, Ankur and Meng, Xiangrui and Rosen, Josh and Venkataraman, Shivaram and Franklin, Michael J. and Ghodsi, Ali and Gonzalez, Joseph and Shenker, Scott and Stoica, Ion
      Apache Spark: A Unified Engine for Big Data Processing
      In Commun. ACM, Oct 2016, pages 56--65
  • Thu 03/08 -- Lecture 11: Execution Models - Parameter Server
    • Mu Li and David G. Andersen and Jun Woo Park and Alexander J. Smola and Amr Ahmed and Vanja Josifovski and James Long and Eugene J. Shekita and Bor-Yiing Su
      Scaling Distributed Machine Learning with the Parameter Server
      In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), 2014, pages 583--598
  • Thu 03/15 -- Lecture 12: Execution Models - Timely Dataflow
    • Murray, Derek G. and McSherry, Frank and Isaacs, Rebecca and Isard, Michael and Barham, Paul and Abadi, Mart\'\in
      Naiad: A Timely Dataflow System
      In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, 2013, pages 439--455
  • Tue 03/20 -- Lecture 13: Execution Models - Tensor Flow
    • Mart\'\in Abadi and Paul Barham and Jianmin Chen and Zhifeng Chen and Andy Davis and Jeffrey Dean and Matthieu Devin and Sanjay Ghemawat and Geoffrey Irving and Michael Isard and Manjunath Kudlur and Josh Levenberg and Rajat Monga and Sherry Moore and Derek G. Murray and Benoit Steiner and Paul Tucker and Vijay Vasudevan and Pete Warden and Martin Wicke and Yuan Yu and Xiaoqiang Zheng
      TensorFlow: A System for Large-Scale Machine Learning
      In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 2016, pages 265--283
  • Thu 03/22 -- Lecture 14: Making sense of performance - Monotasks
  • Tue 04/03 -- Lecture 15: Resource Disaggregation
    • Peter X. Gao and Akshay Narayan and Sagar Karandikar and Joao Carreira and Sangjin Han and Rachit Agarwal and Sylvia Ratnasamy and Scott Shenker
      Network Requirements for Resource Disaggregation
      In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 2016, pages 249--264
  • Thu 04/05 -- Lecture 16: Virtualization
    • Manco, Filipe and Lupu, Costin and Schmidt, Florian and Mendes, Jose and Kuenzer, Simon and Sati, Sumit and Yasukata, Kenichi and Raiciu, Costin and Huici, Felipe
      My VM is Lighter (and Safer) Than Your Container
      In Proceedings of the 26th Symposium on Operating Systems Principles, 2017, pages 218--233
  • Tue 04/10 -- Lecture 17: Tracing Distributed Systems
  • Thu 04/12 -- Lecture 18: Orchestration
  • Thu 04/19 -- Lecture 20: Security