Course Information:
Topic: | Non-standard, Web-scale Databases |
Instructor: | Stan Zdonik |
Pseudo-TA: | Andy Pavlo |
Semester Term: | Spring 2011 |
Where/When: | CIT 367 @ M 3:00-5:20 |
Mailing List: | cs227.2010-11.s@lists.cs.brown.edu |
Announcements:
-
Apr 20, 2011 ↠ Python Implementation of TPC-C Benchmark Now Available
The framework for the TPC-C benchmark that we are going to use to compare the systems is now available. The basic idea is that you will need to create a new driver file that implements the functions defined in "abstractdriver.py". One function will load in the tuples into your database for a given table. Then there are five separate functions that execute the given transaction based on a set of input parameters. All the work for generating the tuples and the input parameters for the transactions has been done for you. Here's what you need to do to get started:- Download the source code from Github.
- Create a new file in the 'drivers' directory for your system that follows the proper naming convention. For example, if your system is 'MongoDB', then your new file will be called 'mongodbdriver.py' and that file will contain a new class called 'MongodbDriver' (note the capitalization).
- Inside your class you will need to implement the required functions of defined in AbstractDriver. There is documentation on what these need to do also available on Github.
- Try running your system. I would start by defining the configuration file that gets returned with by the 'makeDefaultConfig' function in your driver and then implement the data loading part first, since that will guide how you actually execute the transactions. Using 'MongoDB' as an example again, you can print out the driver's configuration dict to a file:
$ python ./tpcc.py --print-config mongodb > mongodb.config
Make any changes you need to 'mongodb.config' (e.g., passwords, hostnames). Then test the loader:$ python ./tpcc.py --no-execute --config=mongodb.config mongodb
You can use the CsvDriver if you want to see what the data or transaction input parameters will look like. The following command will dump out just the input to the driver's functions to files in /tmp/tpcc-*
$ python ./tpcc.py csv
You can also look at my SqliteDriver implementation to get an idea of what your transaction implementation functions need to do.
You can test out your implementation using the EC2 node. If something isn't working quite the way you expect it to, let me know and I will see whether there is a bug. Do no store any usernames, passwords, or keys in the source code.
-
Mar 14, 2011 ↠ Schedule Adjustment
We have changed the presentation schedule to give more time to the NoSQL system presentations and discussions. -
Feb 08, 2011 ↠ Schedule Updated + System Assignments Posted
The presentation schedule has been updated with the assigned groups. Please contact us if your name does not appear or if you think somebody has dropped the course. You are free to swap topics with another student but please let us know first.We also have posted the list of NoSQL systems that people chose in yesterday's class. Each group should take a cursory search on their system to determine if there will be enough information to do a thorough analysis. Be prepared to briefly talk about your system next class.
Lastly, as a reminder, you should all plan your presentations to be one hour long. This allows for adequate time for discussion and questions. A good rule of thumb is to have 2/3 slides for every minute of your talk. Here is also an excellent guide for creating talk slides that I have found to be helpful.
-
Feb 04, 2011 ↠ Mailing List + Paper Reports + System Preference Survey
The new CS227 mailing list is now operational. If have already registered for the class on Banner (as of today), then you will have been added to the mailing list. Please email Stan if you still need him to override the pre-requisite requirements and then add yourself to the mailing list here.Your reports on this week's reading assignment will be due in paper form at the beginning of class on Monday. Non-presenters will write and submit a one-page description of the three most important ideas from the readings. Each idea should be described in a well-written paragraph that states the new idea and describes what it is about the idea that makes it worth remembering.
Lastly, we have sent a link to a survey about your NoSQL system system preference. Please pass it along to anybody that has not signed up to the mailing list yet.
-
Jan 31, 2011 ↠ First Day of Class!
Welcome to what is being colorfully referred to as "NoSQL - The Course". This is seminar course that will cover new, emerging classes of data management systems that deviate from "traditional" relational systems. Topics will include the fundamentals of parallel and NoSQL databases, cloud computing databases, partitioned main memory systems, and column-store databases. The course will also include an overview and analysis of state-of-the-art NoSQL systems currently used in industry. Both the course syllabus and reading list have been posted.