CS1951A

Data Science

I never guess. It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.

Sherlock Holmes

Data is the new soil of business and (soon) the core of essentially all domains from material science to healthcare. Mastering big data requires a set of skills spanning a variety disciplines, from distributed systems to statistics to machine learning. It is essential to develop a deep understanding of a complex ecosystem of tools and platforms, as well as the communication skills necessary to explain advanced analytics. This course will provide an overview of the wide area of data science, with a particular focus on to the tools required to store, clean, manipulate, visualize, model, and ultimately extract information from large amounts of data. Topics include:

  • Relational algebra and SQL
  • Data integration and cleaning
  • Data modeling in Python and Pandas
  • Visualizations using D3
  • Clustering and classification
  • Scaling ML algorithms
  • Large-scale processing tools like Spark

Project

Throughout the entire course you will be working on a data science project which seeks to answer an interesting and important real-world question. You will be collecting your own data, cleaning it, modeling it, visualizing it, and finally presenting your results in a poster session at the end of the course. You will work in groups of four, and will be assigned a mentor TA to help you through the process. Additionally, your project can be used as a capstone with just a few extra requirements, fully integrating what you will have learned in the course, and building a fully-functional data science application. Check out the Final Project tab to learn more.

Deliverables

Below are the main deliverables and the tentative grading scheme for the course:

  • Participation (5%)
  • Lab (5%)
  • Assignments (40%)
  • Final Project (42.5%)
  • Midterm (7.5%)

Late days

You are given three late days to use throughout the semester for any assignment, excuding the final project and labs. These days may be applied in any way you desire, from using all three on a single assignment to using one late day on three projects. After you have used all of your late days, we will not grade any late assignments. If you need an extension on an assignment, please contact either the HTAs or Dan directly.