Homework 1-3

Data Collection

Due February 6, 2017 at 11:59pm


In this homework, you will be finding and importing your own CSV file from internet sources and answering simple questions on it. You will need to find a dataset on Kaggle.

If you don't have an account on this site, you will need to create one. Take some time to explore the different datasets on the site, sorting to find the most popular datasets and searching with terms of interest to you. Whatever dataset you choose, it should contain multiple numerical columns and at least 500 rows of data.


Once you have found a dataset, you should import the data to Google Spreadsheets and think about what simple analyses you can do on the data.

In a sheet in your spreadsheet, you will need to have:

  • an explanation of what your data includes
  • the questions you are trying to answer
  • the methods and results of your analyses

You should come up with at least three questions to answer. For each of your questions, you should, at minimum, combine data from two columns into a third column, and then run a summarizing function on the third column. For example, you might perform a computation on each row in the data and then use a summarizing function like MIN, MAX, COUNT, or AVERAGE to answer your question. Or you might nest multiple IF statements to arrive at your conclusion.

Here is an example to give you a sense of the kinds of questions we are looking for. We have a dataset on video game sales here.

The data involves video game sales. We will be considering the Genre, NA_Sales, EU_Sales, JP_Sales, Other_Sales, and Global_Sales columns, where Genre refers to the genre of a game and the sales columns refere to the regional sales of a game in millions.

Some questions we could consider include:

  • What percentage of all games are puzzle games?
  • How many games have NA sales that are over 50% of its global sales?
  • What is the average number of global sales of sports games?

On a separate sheet, please put down how many hours you worked on this assignment, people you worked with and whether you went to TA hours for help or clarification on this assignment.


Once you're done, share your file with cs0030handin@gmail.com by midnight, 2/6.

Make sure your submission has your name in the filename: FirstLast_HW1-3. “FirstLast” should be replaced with your first and last name or we will take off points. Make sure every task has been completed.