About

This course introduces students to a variety of statistical and computational techniques that data scientists use to tell stories. The subject matter for such stories might range from the global COVID pandemic to local elections. As an example of the former, data scientists have designed powerful visualizations demonstrating the ebbs and flows of hospitalization and deaths across regions, highlighting the extent of human suffering. For the latter, one could imagine mining Twitter feeds to measure a community's relative interest in various political candidates, and then using the content of these feeds to predict election winners.

Data fluency can be understood to encapsulate data literacy and data presentation. Data literacy includes basic statistics and machine learning. Students will be taught to extract meaningful information from data using statistical tests and machine learning algorithms. Data presentation incorporates principles of visualization and design. Students will also be taught basic elements of design, and how to use visualization tools to graphically display complex relationships in comprehensible ways.


Learning Outcomes

  1. Students should become proficient in the programming basics of R and RStudio
  2. Students should learn to apply data-science concepts to develop and assess data models
  3. Students should learn to communicate data findings effectively, orally, visually, and in writing

Syllabus

  1. What is Data Science?
  2. Introduction to Spreadsheets
  3. Descriptive Statistics
  4. Introduction to R
  5. Visualizing Data
  6. Exploratory Data Analysis
  7. Introduction to Probability
  8. Introduction to Statistics
  9. Regression
  10. Classification
  11. Clustering
  12. Text Analysis
  13. Visualizing Structured Data (Maps and Networks)

Course Format

CSCI 0100 lectures are held on Mondays and Wednesday at 11am. Many Fridays comprise TA led discussion sections, enhanced by in-class activities. Although all slides will be posted on the course web page, students are nonetheless expected to attend all lectures and all sections, barring any pandemic-related concerns. Students are also expected to peruse weekly readings that reinforce the lecture materials.

In addition to lectures, there are weekly two-hour studio sessions, which offer students a hands-on environment in which to practice the techniques they are taught in lecture. Studios are typically pair-programmed; for their own intellectual benefit, students should make a concerted effort to work with multiple partners over the course of the semester. Students who fail to attend and complete studio assignments will incur penalties.

During some lecture slots (usually on Fridays), the TAs will run something akin to "section". During these meetings, the TAs will lead the students in an in-class activity, and lead an interactive discussion of the ongoing and recently-submitted homework assignments. Section is a great place to gain insights into how to complete the bi-weekly homework assignments.

There are no exams in CSCI 0100. Students are evaluated based on their performance in studio, on bi-weekly homework assignments, and on a mini and final project. The projects involve writing as well as programming, and students are assessed on both of these dimensions (and others, like creativity).

Students should expect to spend 2-3 hours per week in lecture, two hours per week in studio, and 0-1 hours per week in section (as per that week's schedule). In addition to these five hours of instruction, there are four bi-weekly homework assignments (5-10 hours over the course of two weeks), and supplemental readings---available online, free-of-charge---every week (1-2 hours). The mini and final projects are open-ended: for the former, 5 hours over the course of one week should suffice to produce adequate work; and for the latter, 25 hours, over the course of four weeks. In sum, students should expect to spend 12 hours per week working on this course.

**N.B.** Students will present their final projects during the course's scheduled final exam slot. All students are expected to be present at that time, not only to present their own work, but to provide feedback to others.


Grading

Grading rubrics in CSCI 100 are developed by the professor in conjunction with undergraduate TAs. TAs then grade all assignments *anonymously*. Grade complaints on individual assignments should be addressed to the relevant TA within ten days of grade releases.

The professor assigns all final grades and reviews individual assignment grades as necessary (e.g., in borderline cases). Course grade complaints can be addressed to the professor.

The (tentative) grading breakdown is as follows:

Assignment Percentage
Studios 30%
Homeworks 35%
Mini Project 10%
Project 25%

Policies

Late Policy

For assignments that are to be turned in electronically, students will be granted three free late days, which can be applied, as needed, over the course of the semester to homework assignments and the mini-project, but not to the final project; *the final project deadline is a hard deadline; late final projects will not be accepted*.

In the unfortunate circumstance that these three free late days are all used up, late day penalties will apply: -10% within 24 hours, -25% within 48, and -50% within 72. No assignments will be accepted electronically more than 72 hours beyond their due date. Note, however, that assignments due the day before, but turned in the day after, a long weekend (Indigenous People's Day and/or Thanksgiving) are only charged one late day.

For assignments that are to be graded interactively (meaning students have a set time at which they will be meeting a TA), the following late penalties always apply: if the student is late by 10 minutes or less, -10%; 10 to 20 minutes, -20%; more than 20 minutes counts as a "no show", for which the penalty is -50%. This same penalty schedule applies recursively to rescheduled interactive gradings following a no show. Email requests to reschedule interactive gradings must be sent to the relevant grader(s) and to the head TAs at least 2 hours prior to the scheduled meeting time to avoid any penalties.

For group projects that are graded interactively, if some members show up for the grading session while others do not, the grading will proceed, and those who do not appear will receive a grade of 0 for that portion of the project, while those who appear late will be penalized according to the aforementioned penalty schedule.

Extensions may be granted by the professor in extreme circumstances. If you are ill, please visit health services before requesting an extension. If you are under any other sort of duress, please seek advice from a dean before requesting an extension.

Collaboration Policy

Students are encouraged to collaborate with their peers in CSCI 0100.

When working on homework assignments, students may consult one another, but are then required to list the names of all students with whom they discussed an assignment on their submitted work. Even when collaborators are appropriately named on the students' handins, each *individual* student must be able to fully explain their solutions---including all code---to the course staff. Often students search the web for help with R, which is legitimate, as long as they can fully explain their submitted code to the course staff.

Unnatural similarities among students' submissions, especially with students other than those whose names are listed as collaborators, or with materials available on the web, will be forwarded to the Dean of the College's office for review, to assess whether or not there has been a violation of Brown's Academic Code.

If you have any questions about this policy, please ask the course staff for clarification. Not understanding our policy is not grounds for not abiding by it.

Diversity and Inclusion

The computer science department is committed to diversity and inclusion, and strives to create a climate conducive to the success of women, students of color, students of any sexual orientation, and any other students who feel marginalized for any reason.

If you feel you have been been mistreated by another student, or by any of the course staff, please feel free to reach out to one of the CS department's [Diversity and Inclusion Student Advocates](diversity.advocates@lists.brown.edu), or to Professor Greenwald or [Professor Tamassia](rt@cs.brown.edu) (the CS department chair). We, the CS department, take all complaints seriously.

Accommodations

If you feel you have any disabilities that could affect your performance in the course, please contact [SEAS](https://www.brown.edu/campus-life/support/accessibility-services/). We will support accommodations recommended by SEAS.

Harassment

Please review Brown's [Title IX and Gender Equity Policy](https://www.brown.edu/about/administration/title-ix/index.php?q=policy). If you feel you might be the victim of harassment (in this course or any other), you may seek help from any of the resources listed [here](https://www.brown.edu/about/administration/title-ix/index.php?q=resources).

Course Laptop Use

Owning a laptop is neither required nor necessary to succeed in CSCI 100, so not owning a laptop does not preclude you from taking this course. Nonetheless, during some classes, such as sections and programming lectures, students may benefit from the use of a personal laptop. (Note that during other classes, the professor may expressly forbid the use of any personal devices.)

If you do not own a laptop, but would like access to one this semester, please contact the course staff for assistance, if you are comfortable doing so. If not, please reach out to [Dean Elie](https://www.brown.edu/academics/college/people/detail/Vernicia_Elie), the Associate Dean for Financial Advising, for help purchasing a laptop, or the IT service center, to borrow a laptop.


Alternative Brown Courses

This course has no prerequisites. It teaches elementary statistics and elementary computer science, assuming no background whatsoever in either. Students who already have experience in these areas are encouraged to consider taking Data Science (CSCI 1951A) in the Computer Science department, which puts much more of an emphasis on databases; Statistical Learning and Big Data (PHP 2650), which covers more advanced statistics and machine learning topics; or Big Data (ECON 1660) in the Economics department, which has more of an emphasis on modeling causality. These other courses also involve programming, from scratch, various machine learning algorithms. In CSCI 0100, we make use of off-the-shelf machine learning algorithms that are readily available in software libraries.

CSCI 0100 also differs from Introduction to Computation for the Humanities and Social Sciences (CSCI 0030). The primary difference is that CSCI 0100 has a greater emphasis on statistics, while CSCI 0030 teaches more programming. In CSCI 0100 students learn R, a statistical software package, and many basic statistical concepts, such as regression. In CSCI 0030, students learn Python and basic computer programming constructs, such as iteration. Nonetheless, there is significant overlap (for example, in visualization principles), so students are encouraged to take only one of these two offerings. Neither course has any prerequisites.

Computing Foundations: Data (CSCI 0111) is an introductory computer programming course with no prerequisites. It teaches basic algorithms and data structures, applying those concepts to structured and unstructured data. It is a perfect follow-on course for students who become interested in a core computer science course after CSCI 0100.

DATA 0200 is a similar course to CSCI 0100, in that it applies statistical methods to data, but it requires prior programming experience (e.g., CSCI 0111, CSCI 0150, CSCI 0170, etc.). In contrast CSCI 0100 assumes no prior programming experience, and is taught by a computer science professor who taught introductory computer science for a decade, so students can be confident the introduction to programming they obtain in CSCI 0100 will be solid.

Finally, there are also courses in the Department of Biostatistics that teach R as it is applied in biostatistics. These include Essentials of Data Analysis (PHP 1501) and Principles of Biostatistics (PHP 1510/2510), which provide introductions to statistics with applications to biostatistics, and Statistical Programming in R (PHP 1560/2560), which emphasizes programming fundamentals and R specific skills, such as building packages and Shiny apps.