Class summary: Processing Lists

In terms of content, this largely follows the Processing Lists chapter, though with additional details and exercises.

1 Creating and Processing Lists

Last class, we had a brief introduction to creating our own lists. Today, we cover that more carefully, and learn how to write our own functions over lists (rather than just use the ones that Pyret has built in).

1.1 Problem Setup: A Word Game

Consider a game that asks players to generate as many words as they can from a given set of letters. For example, given the letters "a", "e", "s", and "t", players might generate the words "at", "sea", "eat", "sat", and so on. Set aside questions about duplicate letters for now: we’ll come back to those later, but for now assume each letter can be used only once.

We can represent the collection of words that one player generates as a list:

[list: "at", "sea", "eat", "sat"]

To compute scores for this game, what sort of questions might we want to ask about the list for each player?

How many words did the player generate?
How many times did the player use a particular letter?
Are all of the words generated real, in the sense that the appear in an English dictionary?
How many of the words are uncommon, relative to some other list of common words that we had saved separately?
How many long words did the player generate?
And so on

Our goal is to learn how to write programs to answer these questions.

1.2 Processing Lists with Functions

Let’s start with counting how many words the player generated. We’ve already seen the built-in length function on lists, but here we will write it out manually as it is a good first example.

When we wrote programs over ranges of numbers (like the scoville pepper test), we used if statements to make sure we covered each range of numbers. Whenever we are going to write a program to process data that has some structure, we start by articulating that structure and setting up a skeleton of code that matches the structure.

1.2.1 The Structure of Lists

What does this mean in the case of lists? Fundamentally, there are two distinct "shapes" (or structures) of lists: empty lists and non-empty lists. These have different shapes because empty lists have no contents (that we might retrieve while computing), whereas empty lists are guaranteed to have at least a first element and the rest of the list beyond the first element.

Putting this differently, think about how one might build up a list incrementally (as you do with your shopping list or your to-do list): You start with an empty list. Then you add an item to it. Then you add another item, and so on. Each time you add an item, you add it onto an existing list (which may or may not be empty). This leads to the following description of the structure of a list:

A list is either

- empty, or

- an item added onto an existing list

This sort of structural description is useful because it breaks out the cases you need to consider when you process the list. Imagine that you wanted to ask whether coffee is on your shopping list. If the list is empty, the answer is an immediate no. If the list isn’t empty, then coffee could be at the top of the list, or it could be in the rest of the list after the top. Computationally, all cases when the list is not empty are handled similarly, but the empty case is handled differently. It turns out this pattern applies across all functions that process lists, not just one that checks for an item on a shopping list.

Actually, we can write the structural description more precisely, using constructs from the programming language. Such a code-based description makes it easier to see how lists are built up:

A list-of-strings is either

- empty, or

- link(string, list-of-strings)

empty is a built-in constant in Pyret for the empty list. link is a built-in operator that takes an item and a list and returns a new list that has the item at the front/top, followed by the elements on the original list. For example:

> empty

[list: ]

> link("milk", empty)

[list: "milk"]

> link("tea", link("milk", empty))

[list: "tea", "milk"]

> link("cookies", link("tea", link("milk", empty)))

[list: "cookies", "tea", "milk"]

As the outputs show, the list construct we saw last week is a shorthand for building a list incrementally with multiple link operations.

Non-empty lists have components—the first item on the list and the rest of the list—so Pyret also gives us operations for taking lists apart. Here are some examples:

> shopping = link("tea", link("milk", empty))

> shopping.first

"tea"

> shopping.rest

[list: "milk"]

> shopping.first.rest

"milk"

In general you can use a period to access a component of a piece of data with structure. We’ll see many more uses of this pattern in the next two weeks.

1.2.2 The Structure of List-Processing Code

Now that we see the structure of lists, let’s go back to writing code to process a list. Intuitively, we want a function that processes a list to break out the structure with something like:

if <the-list-is-empty>:

else if <the-list-is-a-link>:

end

We’re going to use a slightly different pattern though, one that will scale to additional kinds of data that we will begin defining later this week. Here’s the general structure for a function that processes a list (we’ll fill in the ellipses shortly):

fun list-function(some-list :: List):

cases (List) some-list:

| empty => ...

| link(f, r) => ...

end

The cases construct is designed for data that can have different shapes. Each shape gets its own case, with a separate computation for handling data of that shape.

Back to our goal of counting the number of words in a list. Let’s adapt the general pattern above to that problem:

fun count-words(word-list :: List) -> Number:

cases (List) word-list:

| empty => ...

| link(f, r) => ...

end

(Here, we show how to indicate that the function will return a number.) How do we fill in the answers for the empty and link cases? Before you try coding, remember our design steps – we should write examples to guide us:

fun count-words(word-list :: List) -> Number:

cases (List) word-list:

| empty => ...

| link(f, r) => ...

end

where:

count-words(empty) is 0

count-words(link("tea", empty)) is 1

count-words(link("milk", link("tea", empty))) is 2

end

(The second example uses link rather than list to match the cases in the code.)

Now that we have the examples, let’s think about the code. Our examples tell us that the answer in the empty case should be 0. What about the link case? That case has already given us names for the first element of the list (f) and the rest of the list (r). If we look at the second and third examples, the count of words in the third example seems to be one more than the count of words in the second example (which is the rest of the list from the third example). This suggests the following code:

fun count-words(word-list :: List) -> Number:

cases (List) word-list:

| empty => 0

| link(f, r) => 1 + count-words(r)

end

where:

count-words(empty) is 0

count-words(link("tea", empty)) is 1

count-words(link("milk", link("tea", empty))) is 2

end

If we look at the structural definition of a list of strings, the rest part of each link it itself a list of strings. Thus the easiest way to process such a list is to run the same function over the rest of the list, then write an expression to account for the first item in the list.

2 Templates (for Lists)

This pattern is so common that we give it a name: the template – the template is a skeleton of code that processes a particular data structure. For lists, the template is as follows:

fun list-function(some-list :: List):

cases (List) some-list:

| empty => ...

| link(f, r) => ... f ... list-function(r)

end

Whenever you are asked to write a function that processes a list:

Write examples of the function
Copy down the list template
Replace the two uses of the generic list-function name with the name of your function
Replace some-list with a parameter name that is more relevant to your function
Use the examples to fill in the ellipses to complete the function

For the rest of the class, we are going to practice this pattern on various problems related to our word-game context.