CSCI 2950-T: Topics in Databases and Systems
Data-Intensive Scalable Computing
... In the Cloud and on the Ground

Main          MOTD          Schedule          Details

Basic Information:
Time: N Hour (W 3:00-5:20pm)
Location: CIT 345
Prerequisities:  CS 32 (or equivalent)
Credits: PhD (Area C or G), ScM (practice, significant programming)
CSCI 2950-T: Topics in Databases and Systems
Prof: Ugur Cetintemel


Description:
Data intensive scalable computing (DISC) focuses on efficiently executing large-scale computations over massive data sets.  DISC requires the storage, organization, and processing of data at a scale and efficiency that go well beyond the capabilities of conventional information technologies. As such, the industry has taken the lead on building data centers and DISC systems to successfully provide online web services and analyze data for internal business goals at unprecedented scale, efficiency and availability. The DISC model has the potential to radically change the way we capitalize on data and produce breakthroughs in other areas including science, engineering, health care and security.

This course will investigate the state of the art in DISC systems. We will study the existing DISC platforms, models, and tools as well as the open research challenges, with an emphasis on massively parallel data processing using high-level programming primitives. The topics will include Google's Map-Reduce, Apache Hadoop, Amazon Web Services, and NVIDIA's CUDA. The course will primarily consist of technical readings and discussions. It will also include programming projects where the participants will implement prototype data-intensive applications using existing DISC tools and platforms.