Class summary: Introduction to Lists
Copyright (c) 2017 Kathi Fisler
1 Looking up values by keys
We want a function that takes the name of a person and returns the number of tickets they have ordered:
fun tickets-for(t :: Table, who :: String) -> Number: |
doc: "Extract tickcount value for order with given name" |
... |
where: |
tickets-for(event-data-clean, "Alvina") is 3 |
tickets-for(event-data-clean, "Ernie") is 0 |
end |
We filled in the body as follows:
fun tickets-for(t :: Table, who :: String) -> Number: |
doc: "Extract tickcount value for order with given name" |
matches = filter-by(t, lam(r): r["name"] == who end) |
matches.row-n(0)["tickcount"] |
where: |
tickets-for(event-data-clean, "Alvina") is 3 |
tickets-for(event-data-clean, "Ernie") is 0 |
end |
What happens if we try the following?
tickets-for(event-data-clean,"kathi") |
Our current code assumes that filter-by will return a non-empty table. We should instead check that we got a non-empty table, and raise an error if we did not:
fun tickets-for(t :: Table, who :: String) -> Number: |
doc: "Extract tickcount value for order with given name" |
matches = filter-by(t, lam(r): r["name"] == who end) |
if matches.length() > 0: |
matches.row-n(0)["tickcount"] |
else: |
raise("Tickets-for: table has no row with name " + who) |
end |
where: |
tickets-for(event-data-clean, "Alvina") is 3 |
tickets-for(event-data-clean, "Ernie") is 0 |
tickets-for(event-data-clean, "Kathi") raises "no row" |
end |
The where clause shows how to check whether a call to the function results in an error being raised – rather than write is in the example, we write raises. The string after raises needs to be a substring of the raised error for the test to pass.
2 Lists: Two Motivating Problems
Consider the following two questions:
Is every discount in the table from a valid set of discount codes?
What are the names of everyone with the student discount?
We have an idea of how to write the first one – a filter-by with a helper function that uses or to check the code against a collection of options:
fun check-discounts1(t :: Table) -> Table: |
doc: "filter out rows whose discount code is not valid" |
fun invalid-code(r :: Row) -> Boolean: |
not( |
(r["discount"] == "STUDENT") or |
(r["discount"] == "BIRTHDAY") or |
(r["discount"] == "") or |
(r["discount"] == "EARLYBIRD")) |
end |
filter-by(t, invalid-code) |
end |
There’s something unsatisfying about this solution, though: every time the set of codes changes, we have to change the function. It would be much nicer if the codes could be written independently of the function. Then, the sales department could change the codes without having to bother the programmers every time.
So the real question is how can we rewrite this function so that the set of valid codes is written down outside the function?
3 Lists: a new kind of data for sets
valid-discounts = [list: "STUDENT", "BIRTHDAY", "", "EARLYBIRD"] |
|
fun check-discounts(t :: Table) -> Table: |
doc: "filter out rows whose discount code is not valid" |
fun invalid-code(r :: Row) -> Boolean: |
not(L.member(valid-discounts, r["discount"])) |
end |
filter-by(t, invalid-code) |
where: |
check-discounts(event-data) |
is |
add-row( |
add-row( |
add-row(event-data.empty(), event-data.row-n(3)), |
event-data.row-n(4)), |
event-data.row-n(6)) |
end |
Here is a version written with anonymous functions/lambda.
fun check-discounts2(t :: Table) -> Table: |
doc: "filter out rows whose discount code is not valid" |
filter-by(t, lam(r): not(L.member(valid-discounts, r["discount"])) end) |
where: |
check-discounts2(event-data) |
is |
add-row( |
add-row( |
add-row(event-data.empty(), event-data.row-n(3)), |
event-data.row-n(4)), |
event-data.row-n(6)) |
end |
4 What are Lists?
Lists are one of the key data structures in programming. They feature:
An unbounded number of items
An order on items (first, second, third, ...)
As we will see, there are many built-in operations on lists.
5 Extracting Lists from Tables
Turning to the second question, how could we get a list of names of people with the "STUDENT" discount? (Perhaps we want to validate those names against data from a school).
We know how to filter the table down to only those rows that have "STUDENT" in the discount column. How do we get the names from those rows? We use a table operator called get-column that pulls out the values from a column as a list:
filter-by( |
event-data-clean, |
lam(r): r["discount"] == "STUDENT" end).get-column("name") |
Alternatively, using an intermediate name for the filtered table:
rows = |
filter-by( |
event-data-clean, |
lam(r): r["discount"] == "STUDENT" end) |
rows.get-column("name") |
We’ll do a lot more with lists as we go forward.
6 Operations on Lists
So far, we’ve had an introduction to lists, a way to group together a collection of items (such as a collection of names, grades, dates, images, etc). We saw how to create lists by hand (using [list: ...]) and how to extract a list from the column of a table (using .get-column(colname)).
Next, we cover some of the (many) operations on lists. There’s a full list of the operations in the Pyret lists documentation; we’ll look at just a handful of them today.
We’ll step away from tables and work with lists on their own for now.
7 Categorizing Pizza Toppings
Imagine that you are running a pizzeria and need to track different categories of pizza toppings. Let’s do that by setting up the following lists:
meats = [list: "sausage", "pepperoni", "chicken", "shrimp"] |
veggies = [list: "spinach", "peppers", "onion"] |
unusual = [list: "egg", "pickle"] |
premium = [list: "pickle", "shrimp"] |
What do we notice about lists from these examples? Lists can have any number of items. The items within a list are written as separated by commas.
7.1 Operations: Filter, Member, Distinct
The staff at your office have to vote on which toppings to get as part of the weekly pizza lunch. You have a list of all the votes that people have cast.
topping-votes = |
[list: "peppers", "pepperoni", "onion", "onion", "onion"] |
Here are various expressions that show the list operations of distinct, member, and filter. We introduced L.member and L.distinct in the last lecture. L.filter is analogous to the filter-by operation on tables: filter takes a function that determines whether to keep elements from the list in the output list.
# Which different veggies were ordered? |
unique-veggies = |
L.distinct( |
L.filter(lam(t): L.member(veggies, t) end, topping-votes)) |
|
# What toppings to include on a vegetarian pizza? Leave off the meats |
veg-friendly = |
L.filter(lam(t): not(L.member(meats, t)) end, |
topping-votes) |
If you weren’t sure how to start on something like "which different veggies were ordered", you can start by writing out the tasks:
Create a function that determines whether a string is in the veggie list
Filter the veggies out of the topping-votes list
Remove duplicates from the list of veggies
Each of these tasks is a separate expression in the code: the lam(t): L.member ... is the function, L.filter extracts the veggies, and L.distinct removes the duplicates.
7.2 Operations Recap
What operations do we have so far?
Operation |
| Types and Notes |
L.member |
| List, item -> Boolean |
| Indicates whether item in the list | |
L.distinct |
| List -> List |
| Returns the unique values from input list | |
L.filter |
| (elt -> Boolean), List -> List |
| Returns list of items from input list on which function returns true (in same order as in input list) |
7.3 Map
Now let’s try another problem – it’s vegetarian-awareness week, and we want to replace all the meats in the list with tofu.
Let’s think about what the input and output of this computation should be. We are starting with
[list: "peppers", "pepperoni", "onion", "onion", "onion"] |
which should become
[list: "peppers", "tofu", "onion", "onion", "onion"] |
Note there is exactly one item in the output list for each item in the input list.
Which of our existing list operations can we use for this? We need something that produces a list, and some of the items are different than in the input list. None of the operations we have so far achieve this, so we need something else.
What we need is an operation called L.map, which is similar to transform-column or build-column from tables – L.map produces a list with one item corresponding to each item in the given list, in the same order.
# Make all ingredients vegetarian by replacing meat with tofu |
fun replace-if-meat(str :: String) -> String: |
doc: "If string is a meat, return tofu, else return the string" |
if L.member(meats, str): |
"tofu" |
else: |
str |
end |
end |
|
vegetarian-delight = L.map(replace-if-meat, topping-votes) |
7.4 An Aside: Tables versus Lists
It would seem we could have just as well put our topping information in a table rather than all of these lists. For example:
topping |
| meat |
| veggie |
| unusual |
| ... |
sausage |
| X |
|
|
| |||
egg |
|
|
| X |
| |||
pepperoni |
| X |
|
|
| |||
spinach |
|
| X |
|
| |||
... |
| ... |
| ... |
| ... |
| ... |
Stop and discuss – what are the tradeoffs between one table and our multiple-lists approach?
Here are some observations on this:
The table makes it easier to see which items are in multiple categories
The table would let us make plots and charts using the operations we know in Pyret
The lists are easier to write and modify
The tables could become sparse if we add more categories and ingredients
For our L.filter operations, we can use L.member to look for toppings in specific lists. With the tables, we’d have to keep filtering over the rows to find the meats/veggies/etc, extracting the names, and comparing data. This feels much more complicated (and maybe more expensive time-wise) than using L.member.
Whether you use tables or lists depends on the data you have and how you plan to use it. For the programs we’ve written today, the lists were sufficient and lightweight, so they were the better choice. Other programs might have benefitted from the table-shaped data. This is our first real example of starting to consider choices in how we represent information when designing programs.
8 Combining Map and Filter
Here’s one last example.
For tweeting and texting, people want to reduce the number of characters they have to type. For example, instead of writing "are you home?", they might write "R U home?".
This feels like a problem for L.map – we want to convert each string in the original message to a shortened string.
Here’s a function that shortens common strings:
fun shorten(w :: String) -> String: |
string-replace( |
string-replace( |
string-replace(w, "for", "4"), |
"you", "U"), |
"are", "R") |
end |
How can we shorten all of the words in a message? Let’s assume we have a list of all the words. Then we can use L.map.
msg-words = [list: "unfortunately", "you", "are", "late"] |
|
msg-trim = L.map(shorten, msg-words) |
What if we want to find all of the words that are still long after shortening? We combine map and filter:
msg-words = [list: "unfortunately", "you", "are", "late"] |
|
msg-trim = |
L.filter(lam(w): string-length(w) > 4 end, |
L.map(shorten, msg-words)) |
9 Summary of List Operations
Let’s extend our table of list operations to include map. We’ll also add L.length, which is useful for getting the size of a list.
In the types, the notation List<type> means a list whose elements are of the named type. When the type isn’t fixed, we use generic names like item and elt to show the relationship between the types of the lists and the types of the functions used to produce them.
Operation |
| Types and Notes |
L.member |
| List, item -> Boolean |
| Indicates whether item in the list | |
L.distinct |
| List -> List |
| Returns the unique values from input list | |
L.filter |
| (elt -> Boolean), List<elt> -> List<elt> |
| Returns list of items from input list on which function returns true (in same order as in input list) | |
L.map |
| (elt -> item), List<elt> -> List<item> |
| Returns result of calling function on each element of given list, in order | |
L.length |
| List -> Number |
| Returns length of the list |