Project 1: Cowboy Conspiracy
Important notes:
Make sure to copy the code we give exactly for loading the spreadsheets
Do not use lists in this project
Out: Wednesday, Feb 19
Design check sign-up deadline: Thursday, Feb 20 11:59pm
Design check dates: Friday, Feb 21 10am - Sunday Feb 23 11pm
Optional second check-in dates: Tuesday, Feb 25 - Sunday, Mar 1
In: Mar 3
Summary
Howdy, partner!
You and your cowboy posse are in pursuit of an outlaw who has robbed the local saloon. Rumor has it they have escaped to the city and are masquerading as a taxi cab driver. Your sense of justice and revenge compels you to venture beyond the Wild West into the Urban East in order to catch them. Unfortunately, your horses refuse to ride into the city because of their batophobia (fear of tall buildings). Y’all must navigate dealing with rainy weather for the first time and learn how to hail a NYC taxi cab. Use the data available to figure out when they are likely to be driving. Lasso that data to catch the outlaw!
NYC publishes a lot of open data (see this link). You found records of every taxi ride taken in city cabs during 2015 and 2016, and want to analyze the data, with a particular look at how taxi usage varies by time of day and weather conditions.
Visit this webpage on the 2016 taxi data, to get familiar with the columns that NYC provides in these datasets. Please note that this link might take a while to load.
For better or worse, this dataset is HUGE – it has 131 million rows and consumes more than 17GB of space (so don’t download it!). The raw dataset is too big to open easily in Excel or Pyret, so you’re going to work with a summarized version that we have already computed from the raw data.
The Project
For this project, you need to answer the following Analysis Questions about the 2016 taxi data, which will help you catch the outlaw:
-
To what extent does bad weather affect how many rides people take? There are many ways to interpret bad weather and you can analyze this question through different lenses, such as rain, snow, and temperature.
-
Do the number of rides and total fares follow similar patterns for each day of the week across the year? In other words, is there a reasonably consistent pattern across all Mondays of a year? What about across Saturdays? And so on.
-
Are some days of the week more likely than others to have high numbers of rides?
In addition, you need to provide a way to produce a table that summarizes statistics about the numbers of rides at different times of day under different weather conditions. Specifically, given a table and a function to use to summarize the values for a particular weather condition and time of day, you will write a function summary-table
to produce a (Pyret) table of the following form, where each cell contains some statistic about the number of rides in the given time period on a day with the given weather:
| | Rain | Snow | Clear |
| ---------- | ------ | ------ | ------- |
| Morning | num | ... | ... |
| Afternoon | ... | | |
| Evening | ... | | |
| Night | ... | | |
where num
might be the sum of all rides on rainy-day mornings,
or the daily average on rainy-day mornings, etc.
The project will be completed in two stages: Design and Analysis. In the design stage, you will plan the data, tables, and functions that you will need to conduct the analysis during week two. You’ll do little to no coding for the analysis until after you meet with a TA to review your plans during the Design Check. At the end of Analysis, you will turn in both a Pyret file with your code and a PDF file describing your findings. Expectations for each phase are described in separate sections below.
Note: We believe the hardest part of this assignment lies in figuring out what analyses you will do and in creating the tables you need for those analyses. Once you have created the tables, the remaining code should be similar to what you have written for homework and lab. Plan enough time to think out your table and analysis designs.
Accessing the Data
The following code will load the summarized 2016 taxi data into Pyret:
include tables
include shared-gdrive("cs111-2019.arr", "1PzXKPvJHTi3N_QTShsALKgYaV77ybeqx")
include gdrive-sheets
include image
import math as M
import statistics as S
include shared-gdrive(
"taxi-project-support.arr", "1cN92aQzBeURXjpFWM48pAbm7vwE7p0Sj")
taxi-ssid = "1ZbiTAuBpy55akMtA-gWjRBBW0Jo6EP0h_mQWmLMyfkc" # Spring 2020
taxi-sheet = load-spreadsheet(taxi-ssid) # load spreadsheet
taxi-data-sheet = taxi-sheet.sheet-by-name("data", true) # get data sheet
taxi-data-long =
load-table:
day, weekday, timeframe, num-rides, avg-dist, total-fare
source: taxi-data-sheet
end
Note: that the source code file imported above, taxi-project-support.arr
, contains functions that might be useful for this project that are not in the standard CS0111 Pyret Documentation. Details on this can be found below.
For weather data, we have extracted data from La Guardia airport in New York City in 2016 (from NCDC) and left it in a Google Sheet. You can access it with the following code:
weather-ssid = "1uiWXHjKAeZ7aUjiL6V_IFN5j9uLRHv_b1ji_Nc3IZm4" # Spring 2020
wdata-sheet = load-spreadsheet(weather-ssid)
weather-data =
load-table: date, weekday, awnd, prcp, snow, tavg, tmax, tmin
source: wdata-sheet.sheet-by-name("final2", true)
end
Project Learning Goals
Our hopes for this project are that you
- Become familiar with manipulating large datasets & joining separate datasets based on commmon attributes
- Expand your knowledge on built-in table functions in Pyret to manipulate data
- Gain familiarity with testing using tables as input and output
- Identify patterns and visualize them with charts to draw conclusions about large datasets
Files to Submit on Gradescope
Design Check:
project-1-design-check.pdf
Final Handin:
transit-analysis.arr
transit-report.pdf
Deadline 1: Design Check
With any large computer science project, a large amount of planning and design often occurs before anyone even begins to code.
So, the first deadline is the design check, a time when you meet your project TA (the TA who will grade your project). It’s a low-stress way to start implementing your project to make sure you are on the right track.
Signing Up For a Design Check
Please read the following information carefully.
All projects in CS0111 are done in pairs, so first you must find a partner for the project. Since the deadline to sign up for a design check is tomorrow at midnight, if you haven’t found a partner yet, please email the HTAs ASAP so we can help you find a partner.
To sign up for a design check slot, please fill out this Google form. You’ll be asked for both students’ CS/Brown banner logins, what time slot you’d lke, and the Gradescope Anonymous ID of the partner who will be submitting your design check handin. Only one student must fill out this form, and it is very important that your login information is correct.
Once you fill out the form, you will get two confirmation emails:
- One, from Google forms, that will send you a copy of your responses. Only the person who filled out the form will get this email.
- An email from the staff, saying that your slot has either been confirmed or is now unavailable to you. Both partners should get this email. If you don’t get this email within a minute or two, you might have typed your logins incorrectly.
If you get an email saying that the slot you signed up for is no longer available, this slot might either belong to a TA who has blocklisted you, or someone else might have already signed up for the slot while you were filling out the form. This means your response was not recorded on our end, so please edit your response to form and submit again, choosing a different slot.
Setup and Handin Info
Answer the following questions in a PDF file, project-1-design-check.pdf
, and submit it on Gradescope under Project 1 Design Check by at least 2 hours before your design check. You can create a PDF by writing in your favorite word processor (Word, Google Docs, etc) then saving or exporting to PDF. Ask the TAs if you need help with this. Please put both you and your partner’s cs login (banner username) information at the top of the file.
Questions
-
Look at the summarized 2016 taxi data. Compare the summarized data with the sample of the original table shown on the NYC website. What operations or steps could produce the summarized data from the original? Write a bulleted list of steps (in English, not code) that explain how to produce the summarized form from the original. Make sure you have some ideas of what functions from the Pyret tables documentation you might use.
(The point of this question is to show you that you know almost everything you’d need to do this conversion yourself, had the source data not been so huge – within a couple of weeks you will know how to do all of these steps yourself.)
-
For each of the three analysis questions listed above at the beginning, describe how you plan to do the analysis. You should try to answer these questions:
- What charts, plots and statistics do you plan to generate to answer the analysis questions? Why? What are the types and the axes of these charts, plots and statistics?
- What table(s) will you need to generate those charts, plots and statistics?
- If the table(s) you need have different columns or rows than those that we gave you, provide a sample of the table that you need.
- For each of the new tables that you identified, describe how you plan to create these tables from the ones that we’ve given you. This can include the overall summary table produced by the summary table function. Make sure to list all Pyret operators and functions you might use, (with input/output types and description of what they do, but without the actual code). If you don’t know how to create any table, discuss it with the TA at your design check, or feel free to discuss with TAs at hours beforehand.
Important note: You can use any of the Pyret table, chart and plot operations as you see fitting - some that you could use (but you are not limited to, or required to use these) are: sort-by
, filter-by
, stdev
, mean
, sum
, scatter-plot
, freq-bar-chart
, histogram
. You can read more about these in Tables Documentation.
Sample Answer:
"For example, if you were asked to analyze whether municipalities with a population (in 2000) larger than 30,000 have an increase or decrease in population, your answer to this might be: "I’d start with a table of municipalities that have a population in 2000 of over 30,000, and then make a scatterplot of the population of those cities in 2000 and 2010. I’d add a linear regression line, then check whether there was a pattern in changes between the two population values.
I’d obtain a table of municipalities with a population of greater than 30,000 in 2000 by using the filter-by
function."
-
For the summary-table
function, you will be filling in the body of the following function (you do not have to implement it for the design check, but you do eventually have to implement it):
fun summary-table(t :: Table, f :: (Table, String -> Number)) -> Table:
doc: ```Produces a table that uses the given function f to summarize
rides for each of rain/snow/clear weather during morning/
afternoon/evening/night timeframes.```
...
end
# the type of f is function that takes Table and String and returns a Number
# the String should correspond to the name of the column f will operate on
Generate a general idea of how you want to implement this function for the design check. For example, this might be called summary-table(mytable, sum)
or summary-table(mytable, mean)
to summarize the total or average numbers of rides within the dates represented in mytable
. You are welcome to create any other helper function to work with summary-table
that you see fit for your analysis.
- Provide an example of how this function
summary-table
will be used. Your answer should include an example of the input table, an input function that takes in a Table and String and returns a Number, and an output Table.
-
Given these two tables:
Table 1:
date |
prcp |
2019/10/14 |
1.0 |
2019/10/15 |
1.1 |
2019/10/16 |
0.0 |
Table 2:
date |
number_of_rides |
2019/10/15 |
28591 |
2019/10/14 |
2355 |
2019/10/17 |
14513 |
2019/10/16 |
4810 |
Write a bulleted list of steps (in English) to combine these two tables to one that looks like the table below. If a step corresponds to a specific Pyret tables function, make sure to name the function, even if you’re not completely sure how it will be used!.
date |
prcp |
number_of_rides |
2019/10/14 |
1.0 |
2355 |
2019/10/15 |
1.1 |
28591 |
2019/10/16 |
0.0 |
4810 |
Grading
Your design check grade will be based on whether you had viable ideas for each of the questions and were able to explain them adequately to the TA (for example, we expect you to be able to describe why you picked a particular plot or table format). Your answers do not have to be perfect, but they do need to illustrate that you’ve thought about the questions and what will be required to answer them. The TA will give you feedback to consider as part of your final implementation of the project.
Your design check grade will be worth roughly a third of your overall project grade. Failure to account for key design feedback in your final solution may result in deductions.
Remember that the goal of the design check is to review your project plans and to give you feedback well before the final deadline! Many students make changes to their designs following the check: doing so is common and will not cost you points.
Requirements
-
Submit your work on Gradescope (Project 1 Design Check) before the Design Check. Bring your work for the design phase to the meeting either on laptop (files already open and ready to go) or as a printout. Use whichever format you will find it easier to take notes on.
-
We expect that both partners have participated in designing the project. The TA may ask either one of you answer questions about the work you present. Splitting the work such that each of you does 1-2 of the analysis questions is likely to backfire, as you might have inconsistent tables or insufficient understanding of work done by your partner.
-
Be on time to your design check. If one partner is sick, contact the TA and try to reschedule rather than have only one person do the design check.
Optional Second Check-in
During your design check, you will also have the opportunity to schedule a personal check-in with your project TA where you can ask them any questions you have at that point, or work on a bug you might be having. These are 20-30 minute meetings with your project TA from Feb 25 - Mar 1. Meeting earlier in the time frame above allow you to get higher-level help, while later might allow you to get more focused help.
While this meeting is optional, it’s highly recommended you schedule one with your project TA. In addition, although both partners must go to the design check, only one partner needs to go to the check-in (although it is best if both go).
Note: If you schedule a personal check-in but wish to cancel, do so at least 12 hours before the time of the check-in. We want to respect both your time and the staff’s time! Failure to do so may result in point deductions on your final project grade.
Deadline 2: Analysis and Report
Analysis
For the analysis, you will be submitting a Pyret file named transit-analysis.arr
that contains the function summary-table
, the tests for the function, and all the functions used to generate the report (charts, plots, and statistics).
Note:
- Create at least two different example tables in your tests for the
summary-table
function.
- Make sure to test all helper functions that you create unless they return images.
- If you copy a table or plot into your analysis, you must tell us what it is called in your code so we can reproduce your results.
Sample Answer: Continuing with comparing exam grades for C students as an example, we’d expect to see something in your Pyret file like the following:
# ------ Analysis for question on exam grades for C students --- #
fun more-than-thirty-thousand(r :: Row) -> Boolean:
...
end
qualifying-munis = filter-by(municipalities, more-than-thirty-thousand)
munis-ex1-ex2-scatter = lr-plot(c-students, "population-2000", "population-2010")
Then, your report may look like this:

Guidelines on the Analysis
In order to do these analyses, you will need to get day-of-the-week information into the tables and combine data from the two tables based on common dates.
Combining data across tables: Both tables store data by dates, which means you should be able to combine information to create a single table. However, these two tables have different date formats (this was intentional on our part). Handle aligning the date formats in Pyret, not in Google Sheets. One of our goals for this project is making sure you know how to use coding to manipulate tables for combining data. Load both tables into Pyret, then figure out how to combine the information. Pyret String documentation might be your friend!
Note: As we saw in the lecture on errors in data tables, small errors and typos can lurk in datasets. While you might be tempted to just combine columns from the tables by relying on them having the same dates in the same order, this would not be a safe option unless you also had code to check this assumption about the dates. For now, your approach should look up each date from one table in the other. We will revisit to how to write this check in lecture once we finish teaching you what we need to do that.
Hint: If you feel your code is getting to complicated to test, add helper functions! You will almostly certainly have computations that get done multiple times with different data for this problem. Create and test a helper or two to keep the problem manageable. You don’t need helpers for everything, though – it is fine for you to have nested build-column
expressions in your solution, for example. Don’t hesitate to reach out to us if you want to review your ideas for breaking down this problem.
Report
For the report, you will be submitting a file named transit-report.pdf
. Include in this file the copies of your charts and the written part of your analysis. Your report should address the three analysis questions outlined at the beginning of this assignment.
You should make a report of your findings in a Word or Google Document, which you can then conver to a PDF for submission. Pyret makes it easy to make this kind of report. When you make a plot, there is an option in the top left hand side of the window to save the chart as a .png
file which you can then copy into your document.
Additionally, whenever you output a table in the interactions window, Pyret gives you the option to copy the table. If you copy the table into some spreadsheet, it will be formatted as a table that you can then copy into Word or Google Docs.
Your report should contain any relevant plots and tables, any conclusions you have made, and your reflection on the project (see next section). We are not looking for fancy or specific formatting, but you should put some effort into making sure the report reads well (use section headings, full sentences, spell-check it, etc). There’s no specified length – just say what you need to say to present your analyses to answer the questions.
An example of what part of your report might look like:

Reflection
Your report should also include a section with answers to each of the following questions. Do this after you have finised the coding portion of the project!
- Describe one key insight that each partner gained about programming or data analysis from working on this project and one mistake or misconception that each partner had to work though.
- Based on the data and analysis techniques you had, how confident are you in the quality of your results? What other information or skills could have improved the accuracy and precision of your analysis.
- State one or two followup questions that you have about programming or data analysis after working on this project.
- Imagine you are an urban planner using this dataset to identify dates and times that are more likely than others to have high numbers of commuters. How could only using this dataset make your analysis less accurate? Think about data or populations that are missing from this dataset.
- Imagine that the following attributes were added to the public taxi data set. For each of the following attributes, identify a potential ethical issue that could arise due to its addition. Think about who could use the dataset, how it could be analyzed, or for what purposes the analysis could be used.
- the address of the start and end point of the ride
- the individual who took the ride (identified by a unique number, not their name) along with their pickup location
- any “protected attributes” (defined in the U.S. as gender, race, disability, age, etc.)
For your final handin, submit transit-analysis.arr
and transit-report.pdf
on Gradescope under Project 1. Nothing is required to print in the interactions window when we run transit-analysis.arr
, but your analysis answers should include comments indicating which variable names or expressions yield the data on which you based your answers.
Grading
You will be graded on Functionality, Testing, and Design/Style for this assignment. Key metrics for each of these categories are described below.
Functionality:
- Does your code accurately produce the data you needed for your analyses?
- Are you able to use code to perform the table transformations required for your analyses?
- Is your
summary-table
function working?
Testing:
- Have you tested your functions well, particularly those that do computations more interesting than extracting cells and comparing them to other values?
- Have you shown that you understand how to set up smaller tables for testing functions before using them on large datasets?
Design/Style:
- Have you chosen suitable charts and statistics for your analysis?
- Have you identified appropriate table formats for your analysis tasks?
- Have you created helper functions as appropriate to enable reuse of computations?
- Have you chosen appropriate functions and operations to perform your computations?
- Have you used docstrings and comments to effectively explain your code to others?
- Have you named intermediate computations appropriately to improve readability of your code? This includes both what you named and whether the names are sufficiently descriptive to convey useful information about your computation.
- Have you followed the other guidelines of the style guide (line length, naming convention, etc.)
Addtional helper functions
taxi-project-support.arr
contains a function that might be helpful in manipulating your data. This is not in the original CS0111 Pyret Tables Documentation, but feel free to use it if you’d like:
long-to-wide
: Long to wide can be used to bring shared data across rows into columns. For example,
long-to-wide(taxi-data-long, "day", "timeframe")
will produce a table that deletes the “timeframe” column (which repeats the same for time quarters for each day) and brings them into columns, resulting in only one row of data per day. (Feel free to test it out yourself to visualize what is does!)
`
Campuswire and Feedback
- Campuswire can be your friend for this Project!
- Have feedback for the class or for this project? Submit your feedback here.
Project 1: Cowboy Conspiracy
Important notes:
Make sure to copy the code we give exactly for loading the spreadsheets
Do not use lists in this project
Due Date Information
Out: Wednesday, Feb 19
Design check sign-up deadline: Thursday, Feb 20 11:59pm
Design check dates: Friday, Feb 21 10am - Sunday Feb 23 11pm
Optional second check-in dates: Tuesday, Feb 25 - Sunday, Mar 1
In: Mar 3
Summary
Howdy, partner!
You and your cowboy posse are in pursuit of an outlaw who has robbed the local saloon. Rumor has it they have escaped to the city and are masquerading as a taxi cab driver. Your sense of justice and revenge compels you to venture beyond the Wild West into the Urban East in order to catch them. Unfortunately, your horses refuse to ride into the city because of their batophobia (fear of tall buildings). Y’all must navigate dealing with rainy weather for the first time and learn how to hail a NYC taxi cab. Use the data available to figure out when they are likely to be driving. Lasso that data to catch the outlaw!
NYC publishes a lot of open data (see this link). You found records of every taxi ride taken in city cabs during 2015 and 2016, and want to analyze the data, with a particular look at how taxi usage varies by time of day and weather conditions.
Visit this webpage on the 2016 taxi data, to get familiar with the columns that NYC provides in these datasets. Please note that this link might take a while to load.
For better or worse, this dataset is HUGE – it has 131 million rows and consumes more than 17GB of space (so don’t download it!). The raw dataset is too big to open easily in Excel or Pyret, so you’re going to work with a summarized version that we have already computed from the raw data.
The Project
For this project, you need to answer the following Analysis Questions about the 2016 taxi data, which will help you catch the outlaw:
To what extent does bad weather affect how many rides people take? There are many ways to interpret bad weather and you can analyze this question through different lenses, such as rain, snow, and temperature.
Do the number of rides and total fares follow similar patterns for each day of the week across the year? In other words, is there a reasonably consistent pattern across all Mondays of a year? What about across Saturdays? And so on.
Are some days of the week more likely than others to have high numbers of rides?
In addition, you need to provide a way to produce a table that summarizes statistics about the numbers of rides at different times of day under different weather conditions. Specifically, given a table and a function to use to summarize the values for a particular weather condition and time of day, you will write a function
summary-table
to produce a (Pyret) table of the following form, where each cell contains some statistic about the number of rides in the given time period on a day with the given weather:where
num
might be the sum of all rides on rainy-day mornings,or the daily average on rainy-day mornings, etc.
The project will be completed in two stages: Design and Analysis. In the design stage, you will plan the data, tables, and functions that you will need to conduct the analysis during week two. You’ll do little to no coding for the analysis until after you meet with a TA to review your plans during the Design Check. At the end of Analysis, you will turn in both a Pyret file with your code and a PDF file describing your findings. Expectations for each phase are described in separate sections below.
Note: We believe the hardest part of this assignment lies in figuring out what analyses you will do and in creating the tables you need for those analyses. Once you have created the tables, the remaining code should be similar to what you have written for homework and lab. Plan enough time to think out your table and analysis designs.
Accessing the Data
The following code will load the summarized 2016 taxi data into Pyret:
Note: that the source code file imported above,
taxi-project-support.arr
, contains functions that might be useful for this project that are not in the standard CS0111 Pyret Documentation. Details on this can be found below.For weather data, we have extracted data from La Guardia airport in New York City in 2016 (from NCDC) and left it in a Google Sheet. You can access it with the following code:
Project Learning Goals
Our hopes for this project are that you
Files to Submit on Gradescope
Design Check:
project-1-design-check.pdf
Final Handin:
transit-analysis.arr
transit-report.pdf
Deadline 1: Design Check
With any large computer science project, a large amount of planning and design often occurs before anyone even begins to code.
So, the first deadline is the design check, a time when you meet your project TA (the TA who will grade your project). It’s a low-stress way to start implementing your project to make sure you are on the right track.
Signing Up For a Design Check
Please read the following information carefully.
All projects in CS0111 are done in pairs, so first you must find a partner for the project. Since the deadline to sign up for a design check is tomorrow at midnight, if you haven’t found a partner yet, please email the HTAs ASAP so we can help you find a partner.
To sign up for a design check slot, please fill out this Google form. You’ll be asked for both students’ CS/Brown banner logins, what time slot you’d lke, and the Gradescope Anonymous ID of the partner who will be submitting your design check handin. Only one student must fill out this form, and it is very important that your login information is correct.
Once you fill out the form, you will get two confirmation emails:
If you get an email saying that the slot you signed up for is no longer available, this slot might either belong to a TA who has blocklisted you, or someone else might have already signed up for the slot while you were filling out the form. This means your response was not recorded on our end, so please edit your response to form and submit again, choosing a different slot.
Setup and Handin Info
Answer the following questions in a PDF file,
project-1-design-check.pdf
, and submit it on Gradescope under Project 1 Design Check by at least 2 hours before your design check. You can create a PDF by writing in your favorite word processor (Word, Google Docs, etc) then saving or exporting to PDF. Ask the TAs if you need help with this. Please put both you and your partner’s cs login (banner username) information at the top of the file.Questions
Look at the summarized 2016 taxi data. Compare the summarized data with the sample of the original table shown on the NYC website. What operations or steps could produce the summarized data from the original? Write a bulleted list of steps (in English, not code) that explain how to produce the summarized form from the original. Make sure you have some ideas of what functions from the Pyret tables documentation you might use.
(The point of this question is to show you that you know almost everything you’d need to do this conversion yourself, had the source data not been so huge – within a couple of weeks you will know how to do all of these steps yourself.)
For each of the three analysis questions listed above at the beginning, describe how you plan to do the analysis. You should try to answer these questions:
Important note: You can use any of the Pyret table, chart and plot operations as you see fitting - some that you could use (but you are not limited to, or required to use these) are:
sort-by
,filter-by
,stdev
,mean
,sum
,scatter-plot
,freq-bar-chart
,histogram
. You can read more about these in Tables Documentation.Sample Answer:
"For example, if you were asked to analyze whether municipalities with a population (in 2000) larger than 30,000 have an increase or decrease in population, your answer to this might be: "I’d start with a table of municipalities that have a population in 2000 of over 30,000, and then make a scatterplot of the population of those cities in 2000 and 2010. I’d add a linear regression line, then check whether there was a pattern in changes between the two population values.
I’d obtain a table of municipalities with a population of greater than 30,000 in 2000 by using the
filter-by
function."For the
summary-table
function, you will be filling in the body of the following function (you do not have to implement it for the design check, but you do eventually have to implement it):Generate a general idea of how you want to implement this function for the design check. For example, this might be called
summary-table(mytable, sum)
orsummary-table(mytable, mean)
to summarize the total or average numbers of rides within the dates represented inmytable
. You are welcome to create any other helper function to work withsummary-table
that you see fit for your analysis.summary-table
will be used. Your answer should include an example of the input table, an input function that takes in a Table and String and returns a Number, and an output Table.Given these two tables:
Table 1:
Table 2:
Write a bulleted list of steps (in English) to combine these two tables to one that looks like the table below. If a step corresponds to a specific Pyret tables function, make sure to name the function, even if you’re not completely sure how it will be used!.
Grading
Your design check grade will be based on whether you had viable ideas for each of the questions and were able to explain them adequately to the TA (for example, we expect you to be able to describe why you picked a particular plot or table format). Your answers do not have to be perfect, but they do need to illustrate that you’ve thought about the questions and what will be required to answer them. The TA will give you feedback to consider as part of your final implementation of the project.
Your design check grade will be worth roughly a third of your overall project grade. Failure to account for key design feedback in your final solution may result in deductions.
Remember that the goal of the design check is to review your project plans and to give you feedback well before the final deadline! Many students make changes to their designs following the check: doing so is common and will not cost you points.
Requirements
Submit your work on Gradescope (Project 1 Design Check) before the Design Check. Bring your work for the design phase to the meeting either on laptop (files already open and ready to go) or as a printout. Use whichever format you will find it easier to take notes on.
We expect that both partners have participated in designing the project. The TA may ask either one of you answer questions about the work you present. Splitting the work such that each of you does 1-2 of the analysis questions is likely to backfire, as you might have inconsistent tables or insufficient understanding of work done by your partner.
Be on time to your design check. If one partner is sick, contact the TA and try to reschedule rather than have only one person do the design check.
Optional Second Check-in
During your design check, you will also have the opportunity to schedule a personal check-in with your project TA where you can ask them any questions you have at that point, or work on a bug you might be having. These are 20-30 minute meetings with your project TA from Feb 25 - Mar 1. Meeting earlier in the time frame above allow you to get higher-level help, while later might allow you to get more focused help.
While this meeting is optional, it’s highly recommended you schedule one with your project TA. In addition, although both partners must go to the design check, only one partner needs to go to the check-in (although it is best if both go).
Note: If you schedule a personal check-in but wish to cancel, do so at least 12 hours before the time of the check-in. We want to respect both your time and the staff’s time! Failure to do so may result in point deductions on your final project grade.
Deadline 2: Analysis and Report
Analysis
For the analysis, you will be submitting a Pyret file named
transit-analysis.arr
that contains the functionsummary-table
, the tests for the function, and all the functions used to generate the report (charts, plots, and statistics).Note:
summary-table
function.Sample Answer: Continuing with comparing exam grades for C students as an example, we’d expect to see something in your Pyret file like the following:
Then, your report may look like this:

Guidelines on the Analysis
In order to do these analyses, you will need to get day-of-the-week information into the tables and combine data from the two tables based on common dates.
Combining data across tables: Both tables store data by dates, which means you should be able to combine information to create a single table. However, these two tables have different date formats (this was intentional on our part). Handle aligning the date formats in Pyret, not in Google Sheets. One of our goals for this project is making sure you know how to use coding to manipulate tables for combining data. Load both tables into Pyret, then figure out how to combine the information. Pyret String documentation might be your friend!
Note: As we saw in the lecture on errors in data tables, small errors and typos can lurk in datasets. While you might be tempted to just combine columns from the tables by relying on them having the same dates in the same order, this would not be a safe option unless you also had code to check this assumption about the dates. For now, your approach should look up each date from one table in the other. We will revisit to how to write this check in lecture once we finish teaching you what we need to do that.
Hint: If you feel your code is getting to complicated to test, add helper functions! You will almostly certainly have computations that get done multiple times with different data for this problem. Create and test a helper or two to keep the problem manageable. You don’t need helpers for everything, though – it is fine for you to have nested
build-column
expressions in your solution, for example. Don’t hesitate to reach out to us if you want to review your ideas for breaking down this problem.Report
For the report, you will be submitting a file named
transit-report.pdf
. Include in this file the copies of your charts and the written part of your analysis. Your report should address the three analysis questions outlined at the beginning of this assignment.You should make a report of your findings in a Word or Google Document, which you can then conver to a PDF for submission. Pyret makes it easy to make this kind of report. When you make a plot, there is an option in the top left hand side of the window to save the chart as a
.png
file which you can then copy into your document.Additionally, whenever you output a table in the interactions window, Pyret gives you the option to copy the table. If you copy the table into some spreadsheet, it will be formatted as a table that you can then copy into Word or Google Docs.
Your report should contain any relevant plots and tables, any conclusions you have made, and your reflection on the project (see next section). We are not looking for fancy or specific formatting, but you should put some effort into making sure the report reads well (use section headings, full sentences, spell-check it, etc). There’s no specified length – just say what you need to say to present your analyses to answer the questions.
An example of what part of your report might look like:

Reflection
Your report should also include a section with answers to each of the following questions. Do this after you have finised the coding portion of the project!
Handin Information
For your final handin, submit
transit-analysis.arr
andtransit-report.pdf
on Gradescope under Project 1. Nothing is required to print in the interactions window when we runtransit-analysis.arr
, but your analysis answers should include comments indicating which variable names or expressions yield the data on which you based your answers.Grading
You will be graded on Functionality, Testing, and Design/Style for this assignment. Key metrics for each of these categories are described below.
Functionality:
summary-table
function working?Testing:
Design/Style:
Addtional helper functions
taxi-project-support.arr
contains a function that might be helpful in manipulating your data. This is not in the original CS0111 Pyret Tables Documentation, but feel free to use it if you’d like:long-to-wide
: Long to wide can be used to bring shared data across rows into columns. For example,long-to-wide(taxi-data-long, "day", "timeframe")
will produce a table that deletes the “timeframe” column (which repeats the same for time quarters for each day) and brings them into columns, resulting in only one row of data per day. (Feel free to test it out yourself to visualize what is does!)`
Campuswire and Feedback