Due: Monday, July 15 at 6pm (submit through this Google form)
Late Policy: Files can come in until midnight, but we will not answer any questions after 6pm.
To practice working with tables with real data
To practice computing statistics over tables
To practice testing and sanity-checking tables
This assignment combines tables, functions, and lists. Functions are a key part of this assignment. Thus, while you could do this assignment by writing few functions (and creating a lot of named expressions), your goal is to write a collection of functions that capture the computations that you need to solve the following problems. We don’t tell you exactly which functions to write (that’s part of what you are thinking about here), but as a general rule you should look to create functions for computations that you might reuse across similar problems to those here.
Collaboration Policy: Your work on this assignment must be entirely your own. Include a collaboration statement attesting that this was your own work.
Examples (to test functions) are a key part of this assignment. You should be writing examples/tests for all of the functions you write as part of this assignment, unless a question explicitly states otherwise.
Put your answers to these questions in a file named schools.arr.
For this set of problems, you will work with a real dataset about student demographics and test scores from schools across Rhode Island. The data is in a google sheet.
Problem Setup: Assume the school board is concerned about two issues: math scores and the impact of charter schools. You’ll be writing a collection of programs to help them analyze their school-performance data.
The school board wants to explore whether schools with strong math scores are more likely to be charter schools. For purposes of this problem, we define "strong math scores" as at least one standard deviation above the mean (explanation follows). Compute the percentage of charter schools that have strong math scores. Name the result of your computation math-charters-percent.
What is Standard Deviation? Imagine that you had a set of numbers and you placed them along a number line. Standard deviation indicates the spread of values along that line: if most values are close to the mean (a.k.a., average), standard deviation is low; as more values are farther from the mean, standard deviation increases. You can use standard deviation to identify values that are farther from the average (according to pre-defined distributions—
read up separately on standard deviation if you want more information). Pyret’s statistics library provides operators mean and stdev, each of which takes a list and returns a number. Given a list mathL of math scores, the strong ones would be those that are larger than
mean(mathL) + stdev(mathL)
School officials wonder which schools have both strong math scores and strong english scores. Compute a table named strong-matheng that contains schools that have both strong math scores and strong english scores.
What percentage of schools with strong math and english scores are charter schools?. Compute the percentage and name it both-charters-percent.
Now the school board wonders whether strong math scores relate to student poverty levels. Compute a table of schools with poverty levels of at least 50% that also have strong math scores. The poverty level is defined by the percentage of total students who are eligible for either free or reduced price lunch. Name your table poverty-strong-math.
What percentage of charter schools have student poverty rates above 70%? Name your result charter-poverty-percent.
The school board’s data analyst is often asked to report the results of queries broken out by the levels of schools (elementary, middle, and high school). The school names in the school column end with one of ES, MS, or HS to indicate the school level.
Develop a function called count-by-level that consumes a table and produces a table. The resulting table should have exactly 2 columns named level and count. It should have three rows, whose levels are (in order) "Elementary", "Middle", and "High".
For an example of the format, the following table summarizes how many schools are in the "Providence" district:
providence-count-manual =
table: level, count
row: "Elementary", 21
row: "Middle", 7
row: "High", 8
end
(hint: you’ve seen how to create a table by hand. You can also do this inside a function. You don’t need fancy table operations to build a table within a Pyret function.)
Using your count-by-level function, compute the summary table of how many schools are in Providence (starting from the entire data table). Use a check block to confirm that your computed table is the same as the providence-count-manual table that we gave above (this will help make sure that you are producing tables in the form that our automated grading expects). For example:
check:
<YOUR COMPUTATED TABLE GOES HERE>
is providence-count-manual
end
Compute the summary table of how many schools at each grade level have strong math scores. Name your table strong-math-levels.
For the school-count summaries to be accurate, (a) every row in the original table must have a level as part of the school name, and (b) your programs should detect a unique level for every school name. This is an example of a sanity check that we should do on the data before we compute with it.
Check whether your method for computing the count-by-levels tables is counting each school exactly once across the rows in the table. How you perform the check is up to you. Include a descriptive comment alongside your code that explains your approach. If your original approach does not count each row exactly once, modify it to count rows only once. Explain any modifications you had to make in a comment.
[This does NOT earn additional points, but is just an option for those wishing to push their skills a bit farther.]
Have you noticed that when you use operations like filter-by you are passing a function as an argument (instead of just a number, string, etc)? Many (though not all languages) allow this, and it gives you much more flexibility in creating functions that are similar except for some inner computation (like which rows to keep in a filter-by).
For a challenge, try to add functions as arguments to collapse a couple of your helper functions together.
- How do we say that a parameter is a table?
Use Table as the type (as in fun f(t :: Table) ...).
- What is a check block?
We have been writing examples in a where block. check blocks let you write tests outside of a function. Here’s an example:
fun add1(n :: Number):
n + 1
where:
add1(5) is 6
end
check:
add1(10) is 11
end
check blocks are useful when you want to test the value of something without being inside a function. For example, perhaps you computed a table and associated it with a name, then want to check something about the table. Here’s a concrete example using our running gradebook table from lecture:
sorted-table = sort-by(gradebook, "exam1", true)
check:
sorted-table.row-n(0)["exam1"] is 95
end
Feel free to use check blocks in your code as well as where blocks. I tend to think of where blocks as containing a handful of descriptive examples, while check blocks capture more thorough testing of programs.
In grading, we will look for whether you computed correct answers, tested functions adequately, and structured your code well. Concretely, we will check:
Did you create functions for similar computations across your program? Remember that you can create a helper function and name the result of calling that function, as in:
my-table = add-letter-grades(gradebook)
Did you test your functions appropriately, especially in light of the large dataset you are working with? Good testing would involve at least two calls to the function (on different input data).
Did you use names, comments and newlines appropriately to make your code readable to others?
Remember to include the collaboration statement.
If you want to check whether your file has the same names as our grading scripts will look for, insert the following code at the bottom of your file. This simply looks for the names and types that we stipulated in the assignment. If one of these checks fails, fix your code, not these checks.
check: |
is-number(math-charters-percent) is true |
(strong-matheng.length() >= 0) is true |
is-number(both-charters-percent) is true |
(poverty-strong-math.length() >= 0) is true |
is-number(charter-poverty-percent) is true |
is-function(count-by-level) is true |
strong-math-levels |
end |
Submit your schools.arr file. Make sure it has this exact file name so we can run our grading program on your work.