Lecture notes: Course overview

What’s this course about?

You might have heard the term “data science.” This isn’t a data science class per se, but it’s an introduction to the computer science component of data science. Let’s go through a couple of examples to understand what that means.

New York Times e-commerce example

Check out the graphs at the top of this New York Times article. The article is from a great series the Times runs called “What’s going on in this graph?” The articles in the series are targeted at students, and ask students to answer two questions:

  • What do you notice?
  • What do you wonder?

This framework–first reflecting, then asking further questions–is really useful, and we’ll keep coming back to it throughout this course. So: what do you notice about these graphs, and what do you wonder?

This pair of graphs is relevant to our class on a couple of levels. First, one of the things you hopefully noticed is the massive growth in the e-commerce sector. A lot of programming went into making online shopping so successful, and many (though not all!) of the new e-commerce jobs are programming jobs.

Looking at these graphs on another level, we can think about how the graphs were created. Someone had to collect the data (probably pulling from many different datasets), do the right queries on those data (probably different queries for net job growth and for percentage growth), test those queries to make sure they are doing the right thing, and create visualizations of the resulting data. All of these tasks are things we’ll touch on in this class.

What we won’t really discuss are the “applied statistics” that went into making these graphs. Determining what kind of visualization to use to answer a question and doing statistical analyses in order to answer questions (and to determine whether results are significant or are a product of randomness) are important topics, but in this class we will be focusing on other things.

Campuswire

Managing and querying data is important to almost any application of computer science–not just for creating graphs! For another example, let’s talk about Campuswire.

Campuswire is the discussion software our course uses. You can use Campuswire to ask questions about anything you are confused about; the course staff and your classmates can then answer them. You can also use it to find lab partners, and we’ll use it for course announcements.

Based on its interface, what are some data that Campuswire is keeping track of?

  • User accounts
  • Questions (each question has been asked by a user)
  • Categories for questions
  • “Like” count
  • View count
  • Unique view count–how does this work?

Are there any data that you know must exist but aren’t visible in the interface?

  • User passwords
  • Location tracking?

This course

In this course we’ll talk about what these two examples have in common. Each example represents a problem that can be solved via computation. In both cases, someone had to do a few things:

  • Identify and organize the data needed to solve the problem
  • Express computations over the data
  • Test those computations to make sure they are doing what they are supposed to

There’s also a “step zero” that I haven’t listed above:

  • Think about whether it’s a good idea to solve the problem, and how your solution might affect the world around you

For instance, the graphs showed that, while the e-commerce sector is growing rapidly, it isn’t creating all that many new jobs. Is this a good thing or a bad thing? In this class, we’ll talk about a number of computation-related social issues. We’ll talk about how data collection and programming affect the world, and how engineers make ethical decisions.

Logistics

For now, only thing is to check out the course website. There’s a lot of information there about course policies, plans for assignments, etc. If you have any questions, please email Doug.

Let’s start programming: snakes!!!

Imagine you’re working for a graphic design company. The company is working on creating a set of designs based on the patterns found on snakes. You’re charged with writing computer programs to generate the pictures of snakes found in this document. We’re going to start working on these programs next time, but for now, what are some things you notice about these snake patterns?

Next time, we’ll talk about how to write computer programs that can create these images.