Course overview
What’s this course about?
You might have heard the term “data science.” This isn’t a data science class as such, but it’s an introduction to the computer science component of data science. Let’s work through an example to understand what that means.
Take a look at the first graph in this article on Doug’s favorite baseball statistics website. The graph is comparing the site’s traffic this year to last year. First off: what tasks might have been involved in making this graph?
Someone had to collect the data by tracking how many users visited the site over time and storing those counts somewhere. Someone had to do the right queries on those data, and someone had to test those queries to make sure they were correct. In this class, we'll talk about how to organize data, do computations over those data, and test the resulting computations.
Someone also had to create the actual visualization of the data, with the nice colorful lines and labeled axes. We'll talk a little bit about visualizing data in this class, but won't really get into the question of how to present data informative and visually-appealing ways.
One more thing to notice on that page: it's full of ads! At some point, someone built a system to show those ads to us, probably based on other websites we've visited, ads we might have clicked on, or other information the system can infer about us. Just like the program to create the traffic graph, that system had to:
- Identify and organize the data needed to solve the problem
- Break the problem down into subproblems that can be solved with computations
- Express computations over the data
- Test those computations to make sure they are doing what they are supposed to
There’s also a “step zero” that I haven’t listed above:
- Think about whether it’s a good idea to solve the problem this way (or at all!), and how the solution might affect the world
For instance: targeting those ads at us involves tracking our behavior and attributes in a way that some people find invasive. We’ll talk much more later in the course about this kind of tracking, how it works, and some of its effects.
Finally–what patterns do we notice in the graph? Does it raise any questions? The first thing that jumps out for me is that traffic is down this year compared to last year, and that this trend seems to have started as soon as a lot of people in the US started staying home. The second is that both the top (last year) and bottom (this year) lines have the same sort of spikey pattern. The first trend is probably easy to explain–there hasn't been baseball this year until quite recently! To explain the second trend, we might notice that the spikes are each about a week long. Is it possible that people mostly browse baseball statistics when they are bored at work???
Logistics
CSCI 0111 is designed for both prospective CS concentrators and for students studying other disciplines who want to use programming to solve problems–and for students who might not be sure which of those they are yet! The monthly schedule of the course will look something like this:
- In the first month we’ll write our first programs, which will produce images. We’ll then see tabular data (think Excel), and learn how to load, clean, analyze, and visualize data.
- In the second month we’ll learn about some other ways of organizing data, and of writing programs over structured data.
- In the third month we’ll switch programming languages and learn Python, a programming language that’s commonly used for data analysis tasks. We’ll learn how to write programs where data changes over time, see additional ways of structuring data, and end by seeing how to work with tabular data in Python.
The last big thing is to check out the course website. There’s a lot of information there about course policies, plans for assignments, etc. If you have any questions, please ask on Campuswire!
Let’s start programming: flags!!!
Imagine you’re working for a graphic design company. The company is working on creating a set of designs based on various national flags. You’re charged with writing computer programs to generate the flags found in this document. We’re going to start working on these programs next time. What are some things we might need in order to produce images like these?
In lecture, we decided that we’d need to:
- Work with colors, either by name or by combining other colors
- Work with shapes
- Be able to “overlay” shapes onto other shapes
- Express numbers and numerical computations
- Be able to “import” images (for instance, the eagle on the Zambian flag)
- And more–see the lecture capture for details!
Next time, we’ll see how to write programs that produce flags.