Lectures
Lecture | Description | Readings | Data and/or Code |
---|---|---|---|
Section 10a |
HW3 part 1 review |
||
Section 10b |
HW3 part 2 review |
||
Section 10c |
Using ggmap, with the example of Minard's excellent visualization of Napoleon's 1812 March against Russia. |
||
Lecture 26: Simulating the Electoral College |
|||
Lecture 25: Social Network Analysis |
|||
RTweet Demo |
Julia, one of our TAs, put together a neat little demo on how to access and use Twitter data in R using rtweet |
||
Lecture 24: Text Analysis |
The basics of text analysis, including an example Sentiment Analysis of tweets. |
||
Lecture 23: Clustering |
k-means middle school dance example and types of clustering. |
||
Lecture 21: Naive Bayes |
Naive Bayes: A simple probabilistic generative classifier. |
||
Lecture 20a: Maximum Likelihood Estimation |
Finding parameters that maximize the likelihood of the data. |
||
Lecture 20b: Bayes' Rule |
Bayes' Rule, with examples, like the Monty Hall Problem! |
||
Section 9 |
Review of Homework 2. |
||
Lecture 19: Model Selection |
The bias-variance tradeoff, linear regression with regularizers, and a bit about variable selection. |
||
Lecture 18a: Decision Trees |
Classification via decision trees. |
||
Lecture 17: k Nearest Neighbors |
Classification via the k Nearest Neighbors algorithm. |
||
Lecture 16c: Properties of Estimators |
Three desiderata of estimators: consistency, unbiasedness, and efficiency. |
||
Lecture 16b: Regression in Practice |
How to gauge the goodness of fit of a linear model, and to improve a model that isn't so good. |
||
Lecture 16a: Simple Linear Regression |
Using least squares to compute the line of best fit, and where regression got its name. |
||
Lecture 15: Introduction to Machine Learning |
An overview of machine learning, including regression, classification, and clustering. |
||
Guest Speaker Kate Miller: Data Visualization |
Simple Rules for Better Graphs |
||
Lecture 14c: Hypothesis Testing |
|||
Lecture 14b: Confidence Intervals |
|||
Lecture 14a: Introduction to Statistical Inference |
We use descriptive statistics to summarize observed data. We use inferential statistics to draw conclusions about unobserved data from observed data. |
||
Lecture 13a: The Normal Distribution |
The normal distribution, with special guest the standard normal, via the z-transform. |
||
Lecture 13b: Probability Distributions and CLT |
The normal distribution approximates the binomial, and it ain't an accident! |
||
Lecture 12b: Random Variables |
Random Variables, their expectation, and their variance. |
Seeing Theory: A Visual Introduction to Probability and Statistics |
|
Lecture 12a: Introduction to Probability |
Introduction to Probability. |
||
Section 6 |
A data cleaning exercise using TA birthdays. |
||
Section 5 |
Review of Homework 1 (including a discussion of Hilary, as the most poisoned name in U.S. history). |
||
Lecture 10b: Tidy Data |
An introduction to tidy data and tidyr. |
||
Lecture 10a: Data Cleaning |
Introduction to data cleaning with stringr and lubridate. |
||
Homework 0 |
Review of Homework 0, and the inherent untrustworthiness of rankings! |
||
Section 3 |
In-class activity: Electricity Consumption |
||
Programming Basics: Iteration (For and While Loops) |
Introduction to the concept of iteration in programming, including for and while loops. More about R's data structures: vectors, matrices, arrays, lists, etc. |
||
Lecture 8a: Programming Basics: Functions and Conditionals |
Diving into programming fundamentals with functions and conditionals. |
||
Lecture 7a: Measures of Dispersion |
A discussion of spread as it pertains to data, including defining variance and standard deviation. A follow on discussion of quartiles, interquartile range (IQR), and the IQR rule of thumb for identifying outliers. And a study that purports to show that pets relieve stress. |
||
Lecture 7b: Covariance and Correlation |
Bivariate data, and metrics to measure how data covary and/or correlate with one another. An analysis of ice cream sales as a function temperature (yes, sales increase as the temperature increases!). Finally, some guidance on how to (and not to) interpret correlation. |
||
Lecture 6a: Probability Distributions |
Introduction to probability distributions. |
||
Lecture 6b: Histograms |
Histograms are used to visualize univariate data. |
||
Section 2: Introduction to plotting in R |
Introduction to plotting in R, with special guest: ggplot! |
||
Lecture 9a: Exploratory Data Analysis |
Exploratory Data Analysis: how to conduct an initial investigation of data. Anscombe's quartet: on data visualization vs. descriptive statistics. Finally, an exploration of data gathered from a draft procedure used during the Vietnam War. |
||
Lecture 9b: EDA Again |
Another EDA example on air pollution. |
||
Lecture 5b: Introduction to dplyr |
Introduction to the dplyr library. |
||
Lecture 5a: Introduction to R |
Introduction to R. |
||
Section 1: Data Exploration using Spreadsheets |
Explore international development data in spreadsheets, using pivot tables to group data. |
||
Lecture 4c: The Join Operation in spreadsheets, VLOOKUP |
Merging data in spreadsheets. |
||
Lecture 4b: Pivot Tables in Spreadsheets |
Grouping, and then aggregating, data in spreadsheets. |
||
Lecture 4a: Sort and Filter in Spreadsheets |
Sorting and filtering data in spreadsheets. |
||
Lecture 3c: Measures of Central Tendency |
The mean, the median, and the mode. (Including speculations about why both W and Bernie used the mean rather than the median to make claims about tax cuts and donations, respectively.) |
||
Lecture 3b: Descriptive Statistics |
The benefits and risks of descriptive statistics. |
||
Lecture 3b: Qualitative vs. Quantitative Data |
An overview of basic data types. |
||
Lecture 2a: Databases vs. Spreadsheets |
Databases are structured stores of organized data. Database software makes it easy to organize, analyze, and visualize information. |
||
Lecture 2b: Introduction to Spreadsheets |
Spreadsheets provide quick methods for summarizing (mostly numerical) data, and for rudimentary visualzations. |
||
Lecture 1b: Three Modern Case Studies |
Three modern case studies highlight the breadth of applications of data science, from sports to politics. Netflix' collaborative filtering algorithm, which is used to predict user ratings of films, is the third. |
||
Lecture 1a: Two Historical Case Studies |
An anesthesiologist named John Snow (John, not Jon) used visualization to map out cholera. He was a pioneer in data visualization. Florence Nightingale might be remembered today as a nurse, but she also saved many lives using her skills as a statistician and in data visualization. |
||
Lecture 0: Introduction to Data Fluency |
A brief introduction to data science, including our favorite visualizations, as well as an overview of the many applications of data science and its exploratory, explanatory, and predictive goals. |
Fernanda Viégas and Martin Wattenberg on new ways for people to talk and think about data |