About


This course introduces students to a variety of statistical and computational techniques that data scientists use to tell stories. The subject matter for such stories might range from the American slave trade to local elections. As an example of the former, data scientists have designed powerful visualizations demonstrating the growth and demise of the slave trade over time, highlighting the extent of the human displacement. For the latter, one could imagine mining Twitter feeds to measure the public’s relative interest in various political candidates, and then using the content of these feeds to predict election winners.

Data fluency can be understood to encapsulate both data literacy and data presentation. Data literacy includes the basics of statistics and machine learning. Data presentation relies heavily on principles of design. Students will be taught to apply statistical, machine learning algorithms (clustering, regression, and classification) to data sets in order to extract meaningful information from them. They will also be taught basic elements of design, and to use visualization tools to graphically display potentially complex relationships in a comprehensible way.


Important links



Likely topics


  1. What is Data Science?
  2. Introduction to Spreadsheets
  3. Descriptive Statistics
  4. Introduction to R
  5. Visualizing Data
  6. Exploratory Data Analysis
  7. Introduction to Probability
  8. Introduction to Statistics
  9. Regression
  10. Classification
  11. Clustering
  12. Text Analysis
  13. Visualizing Structured Data (Maps and Networks)

Course Format


CSCI 0100 lectures are held on Mondays and Wednesday at 1pm. Most Fridays comprise TA led discussion sections, enhanced by in-class activities. Students are expected to attend all lectures and all sections; participation is recorded, and incorporated into students’ grades. For future reference, all slides will be posted on the course web page, after class. There are also required weekly readings that reinforce the lecture materials.

In addition to weekly lectures, there are weekly two-hour studio sessions, which offer students hands-on environments in which to practice the techniques they are taught in lecture. Students are required to register for and attend one studio per week. Four studio sections will be offered, likely on Wednesdays and Thursdays.

There are no exams in CSCI 0100. Students are evaluated based on class participation, as well as their performance in studio, on four bi-weekly homework assignments, and on a mini and final project. The projects involve writing as well as programming, and students are assessed on both of these dimensions (and others, like creativity).

Students should expect to spend two hours per week in lecture, two hours per week in studio, and one hour per week in section. In addition to these five hours of instruction, there will be four bi-weekly homework assignments (5-10 hours over the course of two weeks), and supplemental readings—available online, free-of-charge—every week (1-2 hours). The mini and final projects are open-ended: for the former, 5 hours over the course of one week should suffice to produce adequate work; and for the latter, 25 hours, over the course of four weeks. In sum, students should expect to spend 12 hours per week working on this course.


Policies


Grading

Grading rubrics in CSCI 100 are developed by the professor in conjunction with undergraduate TAs. TAs then grade all assignments anonymously. Grade complaints on individual assignments should be addressed to the relevant TA within ten days of grade releases.

The professor assigns all final grades and reviews individual assignment grades as necessary (e.g., in borderline cases). Course grade complaints can be addressed to the professor.

The grading breakdown is as follows:

Assignment Percentage
Attendance 10%
Studios 20%
Homeworks 35%
Project 35%

Late Policy

For assignments that are to be turned in electronically, students will be granted three free late days, which can be applied, as needed, over the course of the semester to homework assignments and the mini-project, but not to the final project; the final project deadline is a hard deadline; late final projects will not be accepted.

In the unfortunate circumstance that these three free late days are all used up, late day penalties will apply: -10% within 24 hours, -25% within 48, and -50% within 72. No assignments will be accepted electronically more than 72 hours beyond their due date. Note, however, that assignments turned in after a long weekend (Indigenous People’s Day and/or Thanksgiving) are only charged one late day

For assignments that are to be graded interactively (meaning students have a set time at which they will be meeting a TA), the following late penalties always apply: if the student is late by 10 minutes or less, -10%; 10 to 20 minutes, -20%; more than 20 minutes counts as a “no show”, for which the penalty is -50%. This same penalty schedule applies recursively to rescheduled interactive gradings following a no show. Last-minute email requests to reschedule interactive gradings must be sent to the relevant grader(s) and to the head TAs at least 2 hours prior to the scheduled meeting time to avoid any penalties.

For group projects that are graded interactively, if some members show up for the grading session while others do not, the grading will proceed, and those who do not appear will receive a grade of 0 for that portion of the project, while those who appear late will be penalized according to the aforementioned penalty schedule.

Extensions may be granted by the professor in extreme circumstances. If you are ill, please visit health services before requesting an extension. If you are under any other sort of duress, please seek advice from a dean before requesting an extension.

Collaboration Policy

Students are encouraged to collaborate with their peers in CSCI 0100. Studios are pair-programmed. For their own benefit, students should make a concerted effort to work with multiple partners over the course of the semester.

When working on homework assignments, students may consult one another, but are then required to list the names of all students with whom they discussed an assignment on their submitted work. Unnatural similarities among students' submissions with other students whose names are not listed will be forwarded to the Dean of the College's office for review, to assess whether or not there has been a violation of Brown's Academic Code.

Even when collaborators are appropriately named on the students’ handins, each individual student must be able to fully explain their solutions—including all code—to the course staff. Often students search the web for help with R, which is legitimate, as long as they can fully explain their submitted code to the course staff.

If you have any questions about this policy, please ask the course staff for clarification. Not understanding our policy is not grounds for not abiding by it.

Diversity and Inclusion

The computer science department is committed to diversity and inclusion, and strives to create a climate conducive to the success of women, students of color, students of any sexual orientation, and any other students who feel marginalized for any reason.

If you feel you have been been mistreated by another student, or by any of the course staff, please feel free to reach out to one of the CS department’s Diversity and Inclusion Student Advocates, or to Professor Greenwald or Professor Hughes (the CS department chair). We, the CS department, take all complaints seriously.

Accommodations

If you feel you have any disabilities that could affect your performance in the course, please contact SEAS , and ask them to contact the course staff. We will support accommodations recommended by SEAS.

Harassment

Please review Brown's Title IX and Gender Equity Policy. If you feel you might be the victim of harassment (in this course or any other), you may seek help from any of the resources listed here.

Course Laptop Use

Owning a laptop is neither required nor necessary to succeed in CSCI 100, so not owning a laptop does not preclude you from taking this course. Nonetheless, during some classes, such as sections and programming lectures, students may benefit from the use of a personal laptop. (Note that during other classes, the professor may expressly forbid the use of any personal devices.)

If you do not own a laptop, but would like access to one this semester, please contact the HTAs for assistance, assuming you are comfortable doing so. Otherwise, please feel free to reach out to Dean Elie, the Associate Dean for Financial Advising, for help purchasing a laptop, or the IT service center, to borrow a laptop.

Alternative Brown Courses

This course has no prerequisites. It teaches elementary statistics and elementary computer science, assuming no background whatsoever in either. Students who already have experience in these areas are encouraged to consider taking Data Science (CSCI 1951A) in the Computer Science department, which puts much more of an emphasis on databases; Statistical Learning and Big Data (PHP 2650), which covers more advanced statistics and machine learning topics; or Big Data (ECON 1660) in the Economics department, which has more of an emphasis on modeling causality. These courses also involve programming, from scratch, various machine learning algorithms. In CSCI 0100, we make use of off-the-shelf machine learning algorithms that are readily available in software libraries.

CSCI 0100 also differs from Introduction to Computation for the Humanities and Social Sciences (CSCI 0030). The primary difference is that CSCI 0100 has a greater emphasis on statistics, while CSCI 0030 teaches more programming. In CSCI 0100 students learn R, a statistical software package, and many basic statistical concepts, such as regression. In CSCI 0030, students learn Python and basic computer programming constructs, such as iteration. Nonetheless, there is significant overlap (for example, in visualization principles), so students are encouraged to take only one of these two offerings. Neither course has any prerequisites.

Computing Foundations: Data (CSCI 0111) is an introductory computer programming course with no prerequisites. It teaches basic algorithms and data structures, applying those concepts to structured and unstructured data. It is a perfect follow-on course for students who become interested in a core computer science course after CSCI 0100.

Finally, there are also courses in the Department of Biostatistics that teach R with an application to biostatistics. These include Essentials of Data Analysis (PHP 1501) and Principles of Biostatistics (PHP 1510/2510), which provide introductions to statistics with applications to biostatistics, and Statistical Programming in R (PHP 1560/2560), which emphasizes programming fundamentals and R specific skills, such as building packages and Shiny apps.