Overview: Computing, Programming, Data, and Computer Science
Copyright (c) 2017 Kathi Fisler
1 What will this course be about?
Computer science is a huge field. Data science is a current hot topic. Each of you may have a different sense of what these are, how they intersect, and what you might expect to get out of this class. Let’s start with an example to help scope this course.
Take a look at the projected web page from the USNews site. Where might each of data science, computer science, and data management/engineering have been used to create that site?
USNews would have had to gather and organize a fair bit of information from the individual schools. Data organization is key to both computer science and data science, and this course will talk about that.
Computing the rankings themselves requires a fair bit of statistics (and data collection). This is solidly part of data science. This course will NOT deal with these topics. In particular, we don’t assume that you are comfortable with statistics
Selecting the ads to display involves machine learning, which spans computer science and data science. We’ll get an initial hint of how these these techniques work (you need some background in both CS and programming before you can study machine learning in detail).
Understanding how to search over a collection of data is part of computer science. Like any software-based process, it depends on organizing and processing data through programming. You will learn fundamentals of programming in this course.
Creating a nice looking website that doesn’t crash depends on many aspects of CS, none of which we will cover in this course.
Understanding how to sanity check and test your analyses so that you trust them is important in both data science and computer science. We will talk about this in this course.
Put differently, this course will focus on computer science techniques that matter for dealing with data and information, as mediated through the task of programming. When we say "data", we don’t just mean graphs, spreadsheets, and statistics. We’ll also be looking at managing the information and data that underlie uses of computing in various domains.
All of these tasks have an element of computer science. Many involve other fields of expertise as well. When people talk about computer science being "interdisciplinary", they are talking both about all of the tasks that go into making modern computing-based tools, and all of the domains in which software and applications are now in use (medical research, self-driving cars, media production, gaming, and so on).
2 So What Will We Do in This Course?
Our focus is going to be four essential topics that underlie the use of computation to solve data-rich problems:
Identifying and organizing the data for a given problem (data design)
Break down a computing question into manageable smaller problems (programming)
Expressing computations over data based on how it is organized (programming/CS)
Checking whether computations are producing the right answers (testing) and avoiding adverse impacts (social responsibility)
These topics have a role in many areas beyond computer science: most of you will organize data for some project or work task, and thinking through how to make sure something "works" is valuable whether or not that something is a program. As for programming, it’s increasingly useful to be able to do a bit of it in different contexts. Also, learning to program teaches you a bit about how computing devices work under the hood.
At a finer-grained level, we will work work on tabular/data science data for about a month, then we will start learning other core CS data structures and how to program with them. These two segments will use a language called Pyret that is particularly well-suited to these tasks. Around spring break, we will switch to Python and cover an additional data structure and some other programming techniques, while continuing to apply what you’ve learned to problems that mimic real applications in computer science.
We’ll also discuss some big-picture social issues in computing to help put what we’re learning in a broader context. You’ll do several reading assignments on data-facing challenges around computing and information technology.
What We Won’t Cover: You will NOT learn how to build mobile
applications, design user
interfaces, create interactive websites, set up databases or networks,
or use machine learning for analytics. The skills taught in this class
are relevant to doing these things, but we won’t cover all of the
skills that you need for these activities. You will get to do projects
and assignments that have the core pieces of mainstream
applications—
The course does NOT expect that you have any experience in programming, statistics, or data analysis.
3 Logistics
Everything you need is on the course website. Post to the Ed board (linked into the website and Canvas) if you have any questions.
4 Starting to Program ...
From here, we covered sections 3.1 and 3.2 of the textbook.