Overview: Computing, Programming, Data, and Computer Science
Copyright (c) 2017 Kathi Fisler
1 What will this course be about?
Computer science is a huge field. Data science is a current hot topic. Each of you may have a different sense of what these are, how they intersect, and what you might expect to get out of this class. Let’s start with a couple of examples to help scope this course.
1.1 Two graphs about e-commerce
Look at the two graphs at the top of this article (it’s from a terrific NYTimes series called "What’s going on in this graph?"). What do you notice about the data? What do you wonder about the data?
[Side Note: we will use these two "notice and wonder" questions throughout the course. They are a tool for helping you reflect and make sense of a problem or scenario before starting to work on it. More detail is on the learning page on the course website.]
When we think about computer science and data science, we need to look at these graphs at two levels: there’s the story they tell about jobs, and the backstory about the impact of computing and IT on the economy.
Figuring out which visualization to use to present the data, which statistics to use to interpret the data, and whether the results are significant lie solidly within data science (and applied statistics). This course will NOT deal with these topics.
Figuring out how write queries on datasets, how to manage multiple sources of research data, and how to organize information to process and maintain it effectively is part of computer science. We will touch on these ideas in this course.
Understanding what processes can be automated and how to organize the information those processes depend on is part of computer science (specifically, this is what programming is all about). You will (start to) learn programming in this course.
Understanding how to sanity check and test your analyses so that you trust them is important in both data science and computer science. We will talk about this in this course.
Put differently, this course will focus on computer science techniques that matter for dealing with data and information, as mediated through the task of programming. When we say "data", we don’t just mean graphs, spreadsheets, and statistics. We’ll also be looking at managing the information and data that underlie applications in various domains. For example ...
1.2 Email
Consider gmail (the Brown system) or some other mail service you accesss through the web or your phone. What sorts of tasks go into either creating gmail or keeping it running?
Users have to be able to create accounts. Each account holds several kinds of information.
Accounts need passwords so that messages can be kept private.
Messages need to get delivered from one user’s computer to other.
Features like tagging email (or putting it in folders) and searching email had to be designed and provided to users.
Someone needs to keep the machines that process the mail running.
Data needs to be "backed-up" or stored multiple places so that if one central computer goes down, users can still get their mail.
Someone had to design the visual look of the system and make sure that users without technical expertise, or users with disabilities, or users from multiple countries (and so on) could use it.
Some tool needs to figure out which ads to display to you (that’s how they keep gmail "free" for users).
The list obviously goes on, but this gives you an initial sense of how many different kinds of tasks go into this: user-design, security, privacy, machine maintenance, accessibility, data management, machine learning ...
All of these tasks have an element of computer science. Many involve other fields of expertise as well. When people talk about computer science being "interdisciplinary", they are talking both about all of the tasks that go into making modern computing-based tools, and all of the domains in which software and applications are now in use (medical research, self-driving cars, media production, gaming, and so on).
2 So What Will We Do in This Course?
Obviously, we can’t cover all of these topics in one semester (you barely hit all of these in a 4-year concentration). So our focus is going to be three essential topics that underlie many of the others:
Identifying and organizing the data for a given problem (data design)
Expressing computations over data based on how it is organized (programming)
Checking whether computations are producing the right answers (testing)
These topics have a role in many areas beyond computer science: most of you will organize data for some project or work task, and thinking through how to make sure something "works" is valuable whether or not that something is a program. As for programming, it’s increasingly useful to be able to do a bit of it in different contexts. Also, learning to program teaches you a bit about how computing devices work under the hood.
Right now, this is all a bit abstract. It will get concrete shortly.
At a finer-grained level, we will work work on tabular/data science data for about a month, then we will start learning other core CS data structures and how to program with them. These two segments will use a language called Pyret that is particularly well-suited to these tasks. Around the beginning of November, we will switch to Python and cover an additional data structure and some other programming techniques, while continuing to apply what you’ve learned to problems that mimic real applications in computer science.
3 Goals
By the end of the course, you should have learned how to:
Apply fundamental data-organizations (called data structures) to capture the information in a computing problem
Break down a computing question into manageable smaller problems
Write programs to compute answers to questions over fundamental data structures
Check whether your programs behave as intended/required
You will NOT learn how to build mobile applications, design user
interfaces, create interactive websites, set up databases or networks,
or use machine learning for analytics. The skills taught in this class
are relevant to doing these things, but we won’t cover all of the
skills that you need for these activities. You will get to do projects
and assignments that have the core pieces of mainstream
applications—
We’ll also discuss some big-picture social issues in computing to help put what we’re learning in a broader context. You’ll do four reading assignments on data-facing challenges around computing and information technology.
The course does NOT expect that you have any experience in programming, statistics, or data analysis.
4 Logistics
Everything you need is on the course website. Post to the Piazza board (linked into the website) or drop me email if you have any questions.
5 Starting to Program ...
Imagine that you are starting a graphic design company, and want to be able to generate images of flags of different sizes and configurations for your customers. To do this, you need to figure out how to create images of flags. The powerpoint slide from lecture shows a collection of flags that you want to be able to create.
What do you notice and wonder about this collection of flags?
In doing this exercise, we identified needing a couple of skills:
We need to compute with numbers to get the proportions right between the flag width and height
We need to build images of flags out of smaller images
We need a way to customize common structures of flags with different colors and optional icons
This is our initial to-do list for learning to program. Next class, we will start creating flags.