Welcome to

Data Fluency for All


CS100



About


This course introduces students to a variety of statistical and computational techniques that data scientists use to tell stories. The subject matter for such stories might range from the American slave trade to local elections. As an example of the former, data scientists have designed powerful visualizations demonstrating the growth and demise of the slave trade over time, highlighting the extent of the human displacement. For the latter, one could imagine mining Twitter feeds to measure the public's relative interest in various candidates, and then using the content of these feeds to predict winners.

Data fluency can be understood to encapsulate both data literacy and data presentation. Data literacy includes the basics of statistics and machine learning. Data presentation relies heavily on principles of design. Students will be taught to apply statistical, machine learning algorithms (clustering, regression, and classification) to data sets in order to extract meaningful information from them. They will also be taught basic elements of design, and to use visualization tools to graphically display potentially complex relationships in a comprehensible way.


Important links


Piazza signup

Full course missive


Likely topics


  1. What is Data Science?
  2. Introduction to Spreadsheets
  3. Measures of Central Tendency and Dispersion
  4. Introduction to R
  5. Visualizing Quantitative and Qualitative Data
  6. Exploratory Data Analysis
  7. Introduction to Probability
  8. Introduction to Statistics
  9. Regression
  10. Classification
  11. Clustering
  12. Text Analysis
  13. Visualizing Structured Data (Maps and Networks)

Policies


Grading

Grading rubrics in CS 100 are developed by the professor in conjunction with undergraduate TAs. TAs then grade all assignments, although the professor assigns final grades and reviews assignments as necessary (e.g., in all borderline cases). Grade complaints on individual assignments should be addressed to the relevant TA, but final grade complaints can be addressed to the professor. The grading breakdown is as follows:

Assignment Percentage
Attendance 5%
Studios 25%
Homeworks 40%
Project 30%

Late Policy

For assignments that are to be turned in electronically, students will be granted three free late days, which can be applied, as needed, over the course of the semester. In the unfortunate circumstance that these three free late days are all used up, late day penalties will apply: -10% within 24 hours, and -25% within 48. No assignments will be accepted electronically more than 48 hours beyond their due date.

For assignments that are to be graded interactively (meaning students have a set time at which they will be meeting a TA), the following late penalties always apply: if the student is late by 10 minutes or less, -10%; 10 to 20 minutes, -20%; more than 20 minutes counts as a "no show", for which the penalty is -50%. This same penalty schedule applies recursively to rescheduled interactive gradings following a no show. Last-minute email requests to reschedule interactive gradings must be sent to the relevant grader(s) and to the head TAs at least 2 hours prior to the scheduled meeting time to avoid any penalties.

For group projects that are graded interactively, if some members show up for the grading session while others do not, the grading will proceed, and those who do not appear will receive a grade of 0 for that portion of the project, while those who appear late will be penalized according to the aforementioned penalty schedule.

Extensions may be granted by the professor in extreme circumstances. If you are ill, please visit health services before requesting an extension. If you are under any other sort of duress, please seek advice from a dean before requesting an extension.

Collaboration Policy

Students are encouraged to collaborate with their peers in CSCI 0100. Studios are pair-programmed, with each student finding a different partner each week.

When working on homework assignments, students may consult one another, but are then required to list the names of all students with whom they discussed an assignment on their submitted work. Unnatural similarities among students' submissions with other students whose names are not listed will be forwarded to the Dean of the College's office for review, to assess whether or not there has been a violation of Brown's Academic Code.

If you have any questions about this policy, please ask the course staff for clarification. Not understanding our policy is not grounds for not abiding by it.

Diversity and Inclusion

The computer science department is committed to diversity and inclusion, and strives to create a climate conducive to the success of women, students of color, students of any sexual orientation, and any other students who feel marginalized for any reason.

If you feel you have been been mistreated by another student, or by any of the course staff, please feel free to reach out to one of the CS department's Diversity and Inclusion Student Advocates, or to Professor Greenwald or Professor Cetintemel (the CS department chair). We, the CS department, take all complaints seriously.

Accommodations

If you feel you have any disabilities that could affect your performance in the course, please contact SEAS , and ask them to contact the course staff. We will support accommodations recommended by SEAS.

Harassment

Please review Brown's Title IX and Gender Equity Policy. If you feel you might be the victim of harassment (in this course or any other), you may seek help from any of the resources listed here.

Alternative Brown Courses

This course has no prerequisites. It teaches elementary statistics and elementary computer science, assuming no background whatsoever in either. Students who do have experience in these areas are encouraged to consider taking Data Science (CSCI 1951A) in the Computer Science department, which puts much more of an emphasis on databases; or Big Data (ECON 1660) in the Economics department, which has more of an emphasis on modeling causality. Both of these courses also involve programming, from scratch, various machine learning algorithms. In CSCI 0100, we make use of off-the-shelf machine learning algorithms that are readily available in software libraries. This course also differs from Introduction to Computation for the Humanities and Social Sciences (now CSCI 0030). The primary difference is that CSCI 0100 has a greater emphasis on statistics, while CSCI 0030 teaches more programming. In CSCI 0100 students learn R, a statistical software package, and many basic statistical concepts, such as regression. In CSCI 0030, students learn some Python and basic computer programming constructs, such as iteration. Nonetheless, there is significant overlap (for example, in visualization principles), so students are encouraged to take only one of these two offerings. Neither course has any prerequisites.