Syllabus:
-
Overview
Many enterprise systems that handle high-profile data (e.g., financial and order processing systems) also need to be able to scale but are unable to use NoSQL solutions because they cannot give up strong transactional and consistency requirements. The only options previously available for these organizations were to either purchase a more powerful single-node machine or develop custom middleware that distributes queries over traditional DBMS nodes. Both approaches are prohibitively expensive and thus are not an option for many.
As an alternative to NoSQL and custom deployments, a new emerging class of parallel DBMSs, called NewSQL, are designed to take advantage of the partitionable workloads of enterprise applications to achieve scalability without sacrificing ACID guarantees. The applications targeted by these NewSQL systems are characterized as having a large number of transactions that (i) are short-lived (i.e., no user stalls), (ii) touch a small subset of data using index look-ups (i.e., no full table scans or large distributed joins), and (iii) are repetitive (i.e., executing the same queries with different inputs). Such transactions in enterprise applications are also typically executed as pre-defined transaction templates or stored procedures in order to reduce DBMS overhead.
This is fundamentally a seminar course. There will be no exams. You will be graded on the basis of your participation in projects and presentations. There will be readings assigned for each class. In some cases these will be assigned papers and in others it will be a topic that you will be responsible for finding information from the Internet or other sources. Each class will have one or more presenters whose job it will be to lead the discussion in class. The non-presenters should prepare a 1-page position paper on the topic.
Each student will be responsible for the following items:
- Weekly Reading Review (One page)
- One Paper Presentation (60 minutes)
- One NewSQL System Presentation (20 minutes)
- Programming Project
-
Weekly Reading Review
At the beginning of each class, each student (except for those that presenting that day) will need to turn in a one page review of the assigned papers readings for that day. Note that this review will only need to cover the mandatory readings, but students are encouraged to peruse the supplmental readings. Be sure to include your name and CS login at the top of the paper.
Each review must include the following information:
- An overview of the main idea and contributions.
- Three positive comments about the paper.
- Three negative comments about the paper.
- Technical / research discussion questions unanswered about the the paper
WARNING: These weekly reviews must be your own writing. You may not copy from the papers or other sources that you find on the web. Plagarism will not be tolerated.
-
Paper Presentation
Each student will be assigned to a group to present a series of papers on a topic once during the semester. If multiple primary papers are assigned in a week, all of the papers must be covered equally during the presentation.
Each presentation should have the following content (at least):
- Introduce the background and the problem that the paper(s) are trying to solve.
- Summarize the key points from the paper(s) with examples if possible.
- Describe how the contributions in the paper(s) are relevant to the NewSQL systems discussed in class.
WARNING: It is acceptable for students to use information and content (e.g., images and graphics) found on the Internet but the original source must be properly attributed/cited. No credit will be given for presentations without proper citations.
-
NewSQL System Presentation
Each student will also be assigned to present a NewSQL system to the class. Since there will be no single source of information about these systems, students will use the Internet to find as much information as possible about each system and then present it to the class. These presentations should be attempt to provide an overview of each system from an academic viewpoint in light of the readings discussed in class. Students should avoid repeating things already discussed in presentations
- Provide a 140-character description of the system.
- Introduce the background and the problem that the system are trying to solve.
- Discuss key architectural components.
- Answer the following questions:
- Does the system support distributed transactions?
- Is the system open-source?
- Does the system support stored procedures?
- Present any reviews and discussions from secondary sources (e.g., blogs, technology journals)
WARNING: It is acceptable for students to use information and content (e.g., images and graphics) found on the Internet but the original source must be properly attributed/cited. No credit will be given for presentations without proper citations.
-
Programming Projects
Each student will be assigned to one programming project using the H-Store system. The programming projects are designed to have students learn fundamental database techniques used in NewSQL systems, as well as to familiarize them with modern development practices. All projects will be based on a pre-determined list of topics listed in the H-Store Github repository. The projects are divided into three categories based on their difficult:
- EASY - Single person group only.
- MEDIUM - One or two person group.
- HARD - Two person group only.
Each project will be developed in a separate Git branch. Students are not to commit anything to the master H-Store repository unless instructed to do so. Students working in a two-person group are expected to contribute equally to the workload. We reserve the right to check the amount of code committed by the members of each group to determine whether everyone is helping the project.
During the semester, there will be three required reports that each group will turn in to show that they are making progress on their project. This is to ensure that students do not wait until the last minute before starting projects and to allow time for feedback and code reviews. Such practices are standard in the database workload because these systems are used to store mission critical information that cannot get corrupted or loss due to programmer error. Groups are strongly encouraged to write a small status update every week in their Github issue entry about how the project is progressing so that it is easier to write these reports.
- Proposal
- Provide documentation on the parts of the H-Store codebase that will need to be modified. Describe any new classes that will be added.
- Provide a high-level functional specifications for the unit tests that will be written for the project to ensure that is correct.
- List any interesting issues or unanswered questions that will need to be addressed.
- List any missing or broken features in H-Store needed for this project.
- Milestone #1
- Provide a complete design description. If working in a multi-person group, this design document should detail on the expected repsonsibilties/deliverables for each group member.
- A brief description of the preliminary code that has been written to start the project.
- List the experiments that will be run to verify whether the project improves the overall system.
- List any missing or broken features in H-Store needed for this project.
- Milestone #2
- A brief summary of the current state of the project, including what code has been written and what code still needs to be written.
- List any unexpected issues or problems that you encountered.
- List any missing or broken features in H-Store needed for this project.
- Final Deliverable
- A one page report on how the project was implemented and how it works internally.
- The project includes end-user documentation. That is, instructions and examples now how somebody can use this project in H-Store.
- List any additional optimizations or features that were added beyond the original project description.
- List any future work or additional features that could be expanded on for this project.
All programming projects must be code complete and functional before the last class session. All source code must be pushed to Github and merged back to the master H-Store repository before group will recieve their final grade. To ensure quality code from students, a project will not be accepted without JUnit test cases.
-
TA Office Hours
Andy's office hours are by appointment only. Each group is allowed to meet with Andy for 30 minutes each week to discuss their project. This is a hard limit, which means that each group will need to come prepared to each meeting.
Students are also encourage to discuss with each other as the semester progresses on the internal architecture of the H-Store system using the course mailing list. Common questions will be posted on the H-Store FAQ page.