See the complete references plus supplemental reading for each class
in the reading list below.
Note that this schedule is tentative and subject to change (but not much!)
Note that this schedule is tentative and subject to change (but not much!)
Class | Date | Topics | Readings | Notes |
---|---|---|---|---|
Intro | ||||
1 | 9/10 Th | Overview, Logistics, Goals | Homework 0, due 9/15 | |
2 | 9/15 T | Lessons from scaling | LessonsFromGiant, SearchForAPlanet | |
I. Wide Area Content Distribution | ||||
3 | 9/17 Th | Commercial CDNs | CHashWWW or
CHashSTOC (Joe), LargeCDNs(Andrew) |
Two reviews only: choose one of the CHash papers. |
4 | 9/22 T | Grassroots CDNs | Coral (Spiros) BitTorrent (Dan) |
Optional: BitTyrant |
II. Large Scale Datacenter Systems | ||||
5 | 9/24 Th | Consistency vs Availability | EventuallyConsistent (James) BASE (Steve) |
Review due only for BASE |
6 | 9/29 T | Dynamo (Marcelo) | ||
7 | 10/1 Th | GFS (Yash), Chubby (Kevin) |
||
10/2 F | Project draft proposals due 11:59pm. See newsgroup msg for details. | |||
8 | 10/6 T | In-class discussion of project proposals. | Hamming, You and Your Research Sutherland, Technology and Courage |
No reviews required |
10/7 W | Project proposals due 11:59pm | |||
9 | 10/8 Th | MapReduce (Juexin), MRPerformance (Andrew) |
Optional: MLonMapReduce | |
SOSP | 10/13 T | No class, work on projects! | ||
10 | 10/15 Th | Dryad (Steve), DryadLINQ (Dongbo) |
||
11 | 10/20 T | Structured Storage | BigTable (Xiyang), PNUTS Another copy (Qiao) |
|
III. Datacenter Networking | ||||
12 | 10/22 Th | Scaling Ethernet | EtherProxy (Hsu-Sheng), SEATTLE (Dan) |
|
13 | 10/27 T | Datacenter Network architecture | Portland (Qiao), VL2 (James) |
|
14 | 10/29 Th | TCP Incast | FineGrainedTCP (Hsu-Sheng), Griffith09Incast (Juexin) |
|
IV. Monitoring, Tracing, Troubleshooting | ||||
15 | 11/3 T | Misbehavior (Spiros), NetMedic (Kevin) |
||
16 | 11/5 Th | Project5, BorderPatrol |
Guest lecture Prof. John Jannotti | |
17 | 11/10 T | X-Trace, Pip |
Rodrigo presents | |
18 | 11/12 Th | Project progress reports. |
How (and How Not) to write a good systems paper, Armando Fox' Paper Writing Hints |
Short (5-10min) presentations. No summaries. |
V. The Browser as the new platform | ||||
19 | 11/17 T | Chrome(Dongbo), Gazelle (Marcelo) |
||
20 | 11/19 Th | Flapjax (Joe), Ajaxscope (Xiyang) |
||
VI. Energy as a limiting resource | ||||
21 | 11/24 T | EnergyProportional (Sunil), CuttingTheBill (Yash) |
||
11/26 Th | Thanksgiving | |||
22 | 12/1 T | FAWN (Andrew) | ||
23 | 12/3 Th | Cinder (Sunil), Quanto (Marcelo) |
Closing remarks | |
12/8 T | Reading Period - No class | |||
24 | 12/15 T | Presentation Session (see Syllabus) | ||
12/15 T | Project Report Due |
Readings
Reading list in bibtex format. Further reading links are optional and add material that is relevant to what we discussed in class.- 9/15 - Lessons from Scaling
- [LessonsFromGiant] Lessons from Giant-Scale Services Brewer, Eric A. IEEE Internet Computing 2001-07
- [SearchForAPlanet] Web Search for a Planet: The Google Cluster Architecture Barroso, Luiz, Dean, Jeffrey, and Hoelzle, Urs IEEE MICRO 2003-03
- Further reading: eBay Scaling Odyssey, a presentation by Franco Travostino, eBay, at LADIS 08.
- 9/17 - Commercial CDNs
- [CHashWWW] Web caching with consistent hashing Karger, David, Sherman, Alex, Berkheimer, Andy, Bogstad, Bill, Dhanidina, Rizwan, Iwamoto, Ken, Kim, Brian, Matkins, Luke, and Yerushalmi, Yoav Comput. Netw. 1999
- [CHash97STOC] Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web Karger, David, Lehman, Eric, Leighton, Tom, Panigrahy, Rina, Levine, Matthew, and Lewin, Daniel Proceedings of the 29th ACM Symposium on Theory of Computing (STOC) 1997-05 El Paso, Texas
- [LargeCDNs] Measuring and evaluating large-scale CDNs Huang, Cheng, Wang, Angela, Li, Jin, and Ross, Keith W. IMC '08: Proceedings of the 8th ACM SIGCOMM conference on Internet measurement 2008 Vouliagmeni, Greece
- Further reading: Akamai's reactions to LargeCDNs Limewire's reactions to LargeCDNs
- 9/22 - Grassroots CDNs
- [Coral] Democratizing Content Publication with Coral Freedman, Michael J., Freudenthal, Eric, and Mazières, David In Proc. 1st USENIX/ACM Symposium on Networked Systems Design and Implementation (NSDI '04) 2004-03 San Francisco, CA
- [BitTorrent] Incentives build robustness in BitTorrent Cohen, B. Proceedings of the 2nd International Workshop on Peer-to-Peer Systems (IPTPS) 2003 Berkeley, CA
- Further reading:[BitTyrant] Do Incentives Build Robustness in BitTorrent? Piatek, Michael; Isdal, Tomas; Anderson, Thomas; Krishnamurthy, Arvind; and Venkataramani, Arun NSDI'07: Proceedings of the 4th USENIX/ACM Symposium on Networked Systems Design and Implementation 2007
- Also: Michael Freedman, the first author in Coral, has a series of very interesting posts about Coral's design and experiences on his research group's blog at Princeton. He talks, among other things, about how the interface was right, how some of the design was overkill, and security problems that the Coral dns tricks may cause.
- 9/24 - Consistency vs Availability
- [EventuallyConsistent] Eventually consistent Vogels, Werner Commun. ACM 2009
- [BASE] Cluster-based scalable network services Fox, Armando, Gribble, Steven D., Chawathe, Yatin, Brewer, Eric A., and Gauthier, Paul SIGOPS Oper. Syst. Rev. 1997
- Further reading:
- CAP Theorem proof, by Seth Gilber and Nancy Lynch, 2002. This paper has a proof of the theorem.
- PODC Keynote by Brewer where he introduces the CAP Conjecture.
- The Transaction Concept, Virtues And Limitations, J. Gray, Proceedings of 7th VLDB, Cannes, France, 1981, pp. 144-154. This is a good introduction to ACID concepts, although it doesn't spell ACID (no mention of Isolation).
- Shouting in the Datacenter video we talked about!
- 9/29
- [Dynamo] Dynamo: amazon's highly available key-value store DeCandia, Giuseppe, Hastorun, Deniz, Jampani, Madan, Kakulapati, Gunavardhan, Lakshman, Avinash, Pilchin, Alex, Sivasubramanian, Swaminathan, Vosshall, Peter, and Vogels, Werner SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles 2007 Stevenson, Washington, USA
- Related: Kai is an open source implementation of Dynamo in Erlang (I think this is the one Spiros mentioned)
- 10/1
- [GFS] The Google file system Ghemawat, Sanjay, Gobioff, Howard, and Leung, Shun-Tak SIGOPS Oper. Syst. Rev. v 37 n 5, 2003
- [Chubby] The Chubby lock service for loosely-coupled distributed systems Burrows, Mike OSDI '06: Proceedings of the 7th symposium on Operating systems design and implementation 2006 Seattle, Washington
- 10/8
- [MapReduce] MapReduce: simplified data processing on large clusters Dean, Jeffrey and Ghemawat, Sanjay Commun. ACM 2008
- [MRPerformance] Improving MapReduce Performance in Heterogeneous Environments Zaharia, Matei, Konwinski, Andrew, Joseph, Anthony, Katz, Randy, and Stoica, Ion Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI 2008) 2008
- Further Reading: [MLonMapReduce] Map-Reduce for Machine Learning on Multicore Chu, Cheng T.; Kim, Sang K.; Lin, Yi A.; Yu, Yuanyuan; Bradski, Gary R.; Ng, Andrew Y.; and Oluko tun, Kunle Proceedings of NIPS'06
- 10/15
- [Dryad] Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks Isard, Michael, Budiu, Mihai, Yu, Yuan, Birrell, Andrew, and Fetterly, Dennis Proceedings of the European Conference on Computer Systems (EuroSys) 2007-03 Lisbon, Portugal
- [DryadLINQ] DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language Yu, Yuan, Isard, Michael, Fetterly, Dennis, Budiu, Mihai, Erlingsson, Úlfar, Gunda, Pradeep Kumar, and Currey, Jon Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI 2008) 2008
- 10/20 - Structured Storage
- [BigTable] Bigtable: A Distributed Storage System for Structured Data Chang, Fay, Dean, Jeffrey, Ghemawat, Sanjay, Hsieh, Wilson C., Wallach, Deborah A., Burrows, Mike, Chandra, Tushar, Fikes, Andrew, and Gruber, Robert E. ACM Trans. Comput. Syst. 2008
- [PNUTS] PNUTS: Yahoo!'s hosted data serving platform Cooper, Brian F., Ramakrishnan, Raghu, Srivastava, Utkarsh, Silberstein, Adam, Bohannon, Philip, Jacobsen, Hans A., Puz, Nick, Weaver, Daniel, and Yerneni, Ramana Proc. VLDB Endow. 2008
- 10/22 - Scaling Ethernet
- [EtherProxy] EtherProxy: Scaling The Ethernet By Suppressing Broadcast Traffic Elmeleegy, Khaled and Cox, Alan L. Proceedings of IEEE INFOCOM 2009 2009-04 Rio de Janeiro, Brazil
- [SEATTLE] Floodless in seattle: a scalable ethernet architecture for large enterprises Kim, Changhoon, Caesar, Matthew, and Rexford, Jennifer SIGCOMM '08: Proceedings of the ACM SIGCOMM 2008 conference on Data communication 2008 Seattle, WA, USA
- 10/27 - Datacenter Network Architecture
- [Portland] PortLand: a scalable fault-tolerant layer 2 data center network fabric Niranjan Mysore, Radhika, Pamboris, Andreas, Farrington, Nathan, Huang, Nelson, Miri, Pardis, Radhakrishnan, Sivasankar, Subramanya, Vikram, and Vahdat, Amin SIGCOMM '09: Proceedings of the ACM SIGCOMM 2009 conference on Data communication 2009 Barcelona, Spain
- [VL2] VL2: a scalable and flexible data center network Greenberg, Albert, Hamilton, James R., Jain, Navendu, Kandula, Srikanth, Kim, Changhoon, Lahiri, Parantap, Maltz, David A., Patel, Parveen, and Sengupta, Sudipta SIGCOMM '09: Proceedings of the ACM SIGCOMM 2009 conference on Data communication 2009 Barcelona, Spain
- 10/29 - TCP Incast
- [FineGrainedTCP] Safe and effective fine-grained TCP retransmissions for datacenter communication Vasudevan, Vijay, Phanishayee, Amar, Shah, Hiral, Krevat, Elie, Andersen, David G., Ganger, Gregory R., Gibson, Garth A., and Mueller, Brian SIGCOMM Comput. Commun. Rev. 2009
- [Griffith09Incast] Understanding TCP incast throughput collapse in datacenter networks Chen, Yanpei, Griffith, Rean, Liu, Junda, Katz, Randy H., and Joseph, Anthony D. WREN '09: Proceedings of the 1st ACM workshop on Research on enterprise networking 2009 Barcelona, Spain
- 11/3
- [Misbehavior] Emergent (mis)behavior vs. complex software systems Mogul, Jeffrey C. EuroSys '06: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006 2006 Leuven, Belgium
- [NetMedic] Detailed diagnosis in enterprise networks Kandula, Srikanth, Mahajan, Ratul, Verkaik, Patrick, Agarwal, Sharad, Padhye, Jitendra, and Bahl, Paramvir SIGCOMM Comput. Commun. Rev. 2009
- 11/5
- [Project5] Performance debugging for distributed systems of black boxes Aguilera, Marcos K., Mogul, Jeffrey C., Wiener, Janet L., Reynolds, Patrick, and Muthitacharoen, Athicha Proc. SOSP '03 2003 Bolton Landing, NY, USA
- [BorderPatrol] BorderPatrol: isolating events for black-box tracing Koskinen, Eric and Jannotti, John Eurosys '08: Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008 2008 Glasgow, Scotland UK
- 11/10
- [X-Trace] X-Trace: A Pervasive Network Tracing Framework. Fonseca, Rodrigo, Porter, George, Katz, Randy H., Shenker, Scott, and Stoica, Ion NSDI'07: Proceedings of the 4th USENIX/ACM Symposium on Networked Systems Design and Implementation 2007
- [Pip] Pip: detecting the unexpected in distributed systems Reynolds, Patrick, Killian, Charles, Wiener, Janet L., Mogul, Jeffrey C., Shah, Mehul A., and Vahdat, Amin NSDI'06: Proceedings of the 3rd conference on 3rd Symposium on Networked Systems Design \& Implementation 2006 San Jose, CA
- 11/12
- Advice on writing - No reviews due.
- 11/17
- [Chrome] Isolating web programs in modern browser architectures Reis, Charles and Gribble, Steven D. EuroSys '09: Proceedings of the 4th ACM European conference on Computer systems 2009 Nuremberg, Germany
- [Gazelle] The Multi-Principal OS Construction of the Gazelle Web Browser Wang, Helen J., Grier, Chris, Moshchuk, Alexander, King, Samuel T., Choudhury, Piali, and Venter, Herman Proceedings of the 18th USENIX Security Symposium 2009-08
- 11/19
- [Flapjax] Flapjax: A Programming Language for Ajax Applications Meyerovich, Leo A., Guha, Arjun, Baskin, Jacob, Cooper, Gregory H., Greenberg, Michael, Bromfield, Aleks, and Krishnamurthi, Shriram ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages & Applications 2009
- [Ajaxscope] AjaxScope: a platform for remotely monitoring the client-side behavior of web 2.0 applications Kiciman, Emre and Livshits, Benjamin SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles 2007 Stevenson, Washington, USA
- 11/24
- [EnergyProportional] The Case for Energy-Proportional Computing Barroso, Luiz André and Hölzle, Urs Computer 2007
- [CuttingTheBill] Cutting the electric bill for internet-scale systems Qureshi, Asfandyar, Weber, Rick, Balakrishnan, Hari, Guttag, John, and Maggs, Bruce SIGCOMM Comput. Commun. Rev. 2009
- 12/1
- [FAWN] FAWN: A Fast Array of Wimpy Nodes Andersen, David, Franklin, Jason, Kaminsky, Michael, Phanishayee, Amar, and Lawrence Tan, Vijay Vasudevan Proc. 22nd ACM Symposium on Operating Systems Principles (SOSP 2009) 2009-10 Big Sky, MT
- 12/3
- [Cinder] Apprehending Joule Thieves with Cinder Rumble, Stephen M., Stutsman, Ryan, Levis, Phil, Mazières, David, and Zeldovich, Nickolai Proceedings of the First ACM SIGCOMM Workshop on Networking, Systems, Applications on Mobile Handhelds (MobiHeld 2009) 2009
- [Quanto] Quanto: Tracking Energy in Networked Embedded Systems Fonseca, Rodrigo, Dutta, Prabal, Levis, Philip, and Stoica, Ion {Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI'08)} 2008-12