Lab 3: Tables

As you’re packing your bags and getting ready to depart from Venus, you catch a glimpse of a crowd of aliens in super weird clothing. It turns out that here on Venus, they celebrate Halloween a month early! (Although, here, they call it Heilaveen.)
After talking to a couple of aliens, you learn that Venus has mastered Halloween. They have created the perfect candy. While they can’t tell you the secret formula, they are allowed to give you the data sets that they used to create this candy.

Setup

In this lab you’re going to be working with tables!

This data has been published in a FiveThirtyEight article called “The Ultimate Halloween Candy Power,” which is definitely worth a read after the lab. FiveThirtyEight conducted a survey in which tens of thousands of people were asked to choose between two candies; they then computed the winning percentage of each candy. The data-set gives a Number with this winning percentage, and also Booleans about the attributes of the candies and Numbers about the relative price and sugar of the candies.

You’ll be looking at the relationships between these columns in this lab.

First, include this at the top of your program:

include tables
include shared-gdrive("cs111-2018.arr", "1XxbD-eg5BAYuufv6mLmEllyg28IR7HeX")
include gdrive-sheets
include math
include image

# spreadsheet id from Google Sheets
ssid = "1XzeWZToT-lqPFVpp-RZLimdoGdGaTcVQZ_8VbDbAkOc"
data-sheet = load-spreadsheet(ssid)
candy-data =
 load-table: name, chocolate, fruity, caramel, nutty, nougat, crisped-rice, hard, bar, pluribus, sugar-percent, price-percent, win-percent
   source: data-sheet.sheet-by-name("candy-data", true)
 end

This code loads a Pyret table from a Google Sheet. Type “candy-data” in the interactions window to see the data.

You’ll want to refer to the CS-111 Pyret Tables Documentation for this lab (not the built-in documentation)

Part 1: Building columns and analyzing them

1.1: New column

Build a column with Boolean values that indicates whether a candy is fruity and hard, but not a pluribus. Write an expression in the definitions window that uses this new column to compute how many candies meet this condition.

(Pluribus means that there are multiple candies in a packet, ex: Skittles, M&M’s, and Smarties)

Hint: Take a look at the build-column function. Call over a TA if you want help using this.

1.2: Maximum

Of the candies for which this Boolean is true, which has the highest winning percentage?

Hint: This requires the use of the sort-by function, in addition to the get-row function.

An example of the get-row function to take the fifth row of a table: get-row(table,4).

1.3: Mean

Of the candies for which this Boolean is true, what’s the average winning percentage?

Hint: This requires the use of the mean function.

CHECKPOINT: Call over a TA once you reach this point.

Part 2: Filtering

2.1: Chocolate

How many of the candies have chocolate?

Write an expression in the definitions window that outputs this number in the interactions window. You can get the length of a table by writing TABLE.length() where TABLE is the name of (or expression that computes) a table.

Hint: Take a look at the filter-by function, and remember that functions can be passed as inputs to other functions. Call over a TA if you want help using this.

When you’re done writing the expression, copy and paste
“#-----------------------------------” into the definitions window directly below it to separate it from the next part.

2.2: Chocolate and caramel

Of the candies that have chocolate, what proportion also have caramel?
Write an expression in the definitions window that outputs this proportion in the interactions window.

Once again, separate it from the next part by pasting
“#-----------------------------------” on the line below it.

2.3: Chocolate and nutty

Of the candies that have chocolate, what proportion are also nutty?

Write an expression in the definitions window that outputs this proportion in the interactions window.

Once again, separate it from the next part by pasting
“#-----------------------------------” on the line below it.

CHECKPOINT: Call over a TA once you reach this point.

2.4: Chocolate and anything

What attribute is paired most frequently with chocolate?
To help answer this question, write a function that generalizes the expressions from 2.2 and 2.3. As we did in week 1, look for commonalities across the code you wrote for caramel and nutty, and make that common code into the body of a function that takes the different information as an input parameter. The common code may contain more than just an expression (as you’ve seen before) – that’s fine.”

Compare the results of your function on the inputs “fruity,” “nutty,” and “caramel.” You don’t have to write code for this comparison – just use your new function to compute all three proportions and compare them manually.

CHECKPOINT: Call over a TA once you reach this point.

3: Scatterplot

What’s the relationship between sugar and winning percentage? Do these two attributes seem correlated? One way to gain intuition on this is to create a scatterplot that puts one attribute on each axis.

The tables documentation includes a scatter-plot function. Look at the documentation and try to figure out how to use it to generate a scatterplot of sugar versus winning percentage. Write an expression in the definitions window that creates the scatterplot. In a sentence or two, summarize the relationship in a comment.

Hint: Check the tables documentation, linked above, for how to do this

4: Comparing Attributes

Candy analysts want to understand the relative frequencies of the attributes in the table across the collection of candies studied. They want a function where they can give you names of two attributes (column names in the dataset), and you tell them which is more frequent (your function can return “same” if the two attributes are in the same number of candies). Develop this function.

Discuss with your partner how you might go about testing this function. Write a comment describing your testing plan.

Now imagine that the analysts want to see the relative frequencies across all of the attributes. How would you want to report that information, and what would be needed to produce that report from the given dataset? You can think about the functions in the documentation, or you can think about reorganizing the data in the table to make it easier. This is just a discuss-with-partner question—you are NOT being asked to write code to do this analysis.

5: Exploration (if you have time)

Now think about questions you have about this data, or come up with interesting questions to ask. Investigate (by writing and running expressions), and write a couple sentences about what you found.

Takeaways

This lab has mostly been about getting you comfortable working with tabular data and practicing some common operators on tables. It also gets you thinking ahead a bit back to our course theme of data: what patterns of manipulating data do we often use in computations? How does the organization of our data impact our ability to answer these questions?

Here, we see that filtering, ordering, and summarizing data are some of the key operations. So far we’ve only looked at these operations on tabular data, but these same building blocks will arise many times through this course. When you have a computation to perform around data, you should start by thinking through what combinations of filtering, sorting and summarizing will help you compute your answer.