“With too little data, you won’t be able to make any conclusions that you trust. With loads of data you will find relationships that aren’t real… Big data isn’t about bits, it’s about talent” – Merrill

Data is the new soil of business and (soon) at the core of essentially all domains from material science to healthcare. Mastering big data requires a set of skills spanning a variety disciplines from distributed systems over statistics to machine learning, a deep understanding of a complex ecosystem of tools and platforms, as well as communication skills to explain advanced analytics. This seminar will attempt to survey the wide area of data science and will try to understand the different techniques and the interaction between them in detail.

Course Layout

This course is a graduate-level seminar. Thus, the mode of learning will be through reading, discussion, and independent projects. A particular focus of this seminar is on shared learning and improving the communication skills, which will also be part of the final grade. We will divide the class into small teams to work on a fundamental topic in data science (e.g., large-scale storage techniques, association rule mining, collaborative filtering). We expect that every team gets a deep understanding of the respective topic and prepares it in form of a lecture for the other students, including slides and assignments. Every week one group has to give their lecture in front of the other students, which then have one week to do the assignments, followed by two weeks for the presenting team to grade it.


All projects will involve Big Data management and/or analysis, however, projects will vary greatly in both scope and topic. This will depend on several factors, including group size, group background, and topic. We will discuss this in more detail during class, though you are encouraged to begin to think about projects that interest you now. As is the case with many seminar courses, you will get out of this course what you put into it, so taking the time and coming up with a well-scoped project that lies within the context of this course and that you like will go a long way to your enjoyment of the course.


Below are a few of the main milestones and requirements you will complete throughout the semester:

Given the fast-changing nature of the topic, we might adjust this syllabus as needed. Furthermore, as an active participant in the seminar, you may feel free to make suggestions about making the course more effective.




Tim Kraska


Kayhan Dursun

CIT 506

Meeting Time
Wed 3:00-5:20

Special Riddle

From 1946 to 2011, which country (c1) received the most (total dollar amount) non-military loans and grants from the United States (corrected by inflation, i.e., constant dollars). Which country (c2) received most (total dollar) loans and grants per population count?


