On this page:
6.1 What Is This Assignment’s Purpose?
6.2 Reading
6.3 Program
6.4 Task
6.5 What to Turn In
6 The Shell🔗
6.1 What Is This Assignment’s Purpose?🔗

The Unix shell is one of the most widely-used programming languages, and has a powerful notion of laziness central to its evaluation model. In this assignment we want you to get the experience of “retrofitting” a type system onto a dynamic language, by finding invariants in the way data are transformed.

6.2 Reading🔗

Read this paper, at least to the end of section 2.1.Local interest stories: first-author Greenberg is a Brown CS alum, and third author Vasilakis is now a professor here.

The paper references this famous exchange in the journal Communications of the ACM, where Donald Knuth presented a “literate” program for a task and then Doug McIlroy wrote a response in which he showed the same program written in terms of Unix shell commands. McIlroy’s review begins on page 478 and his solution is on page 480. You are not expected to read this document in whole! It’s here as a historical reference (and a fascinating contrast between different styles of programming, and also a statement of the power of programming languages). But it is very useful to read the critique starting on page 480 starting from “Knuth’s purpose was” to the end of page 481.Here’s a cartoon about it. And here’s a classic video about the Unix command philosophy from 1982.

If you have not seen the shell before, these slides do a very nice job of introducing it (especially through slide 41). Note that slide 4 of that deck echoes the Knuth-McIlroy exchange.

6.3 Program🔗

McIlroy’s program, with the last line truncated, is as follows:

tr -cs A-Za-z '

' |

tr A-Z a-z |

sort |

uniq -c |

sort -rn

You can run it interactively as follows:

cat <filename> |

tr -cs A-Za-z '

' |

tr A-Z a-z |

sort |

uniq -c |

sort -rn |

less

which will process <filename> and show you the output a screen at a time.

Note that it is important that you have a newline inside the quote: see McIlroy’s explanation of the program.

6.4 Task🔗

Let us now think about this from the perspective of retrofitted type systems.

One of the reasons the Unix shell is so powerful is because its commands compose very easily: to a first approximation, everything consumes a stream of strings and returns a stream of strings.If you pay really close attention below, you may notice that perhaps some things make more sense on finite lists than on infinite streams. But you can ignore this distinction unless you really want to. However, this perspective is useless from the perspective of debugging programs: it means there is only one type, and everything consumes and produces values of that type. That means “composition” is automatic, but we cannot tell with any more refinement what kind of answer a program would produce.

But obviously, that is not what is going on here! Each line is doing something interesting and producing a much richer kind of stream-of-strings—refining either the stream, the strings, or perhaps even both—which is how we’re able to get to the desired result. So what we’re going to do is retrofit “types” to this program (here it is again with line numbers in McIlroy’s style):

(1) tr -cs A-Za-z '

    ' |

(2) tr A-Z a-z |

(3) sort |

(4) uniq -c |

(5) sort -rn

Each line clearly consumes and produces strings. However, it doesn’t necessarily consume or produce strings without any structure at all. What new structure is gained at each point? How does this accumulation of structure help us determine that this pipeline actually produces the value we desire?

For instance, suppose we run the command grep "CSCI 1730":
  1. From the point of view of the structure of the lines, assuming there was no structure to the input (beyond being a stream of strings), we know that in the output, the strings all contain "CSCI 1730".

  2. From the perspective of grep:

    1. It strengthens whatever per-line structure the input had with the constraint that every line contains the argument string.

    2. If the input is of finite length, the output has no more lines, and may well have fewer lines, than the input.

We ask you to produce a table of the following form, for which we’ve given a sample entry:

Operation

 

Input Type

 

Output Type

 

Output Length

 

Output Order

 

 

 

(relative to input)

 

(relative to input)

grep "CSCI 1730"

 

stream of strings

 

stream of strings that contain "CSCI 1730"

 

<=

 

same

In writing your answer:

  1. If you find it necessary or useful to add more columns to more thoroughly capture the behavior of these operations, feel free to do so!

  2. The operations above are very general. You will find it easier to treat special cases: e.g., instead of uniq in general, treat it as if it were a specialized command, uniq-c; instead of sort in general, specialize to sort-rn; and so on.

  3. Please be as concise and precise as possible.

6.5 What to Turn In🔗

Please produce the table described above. By default, we suggest that it should have five rows, corresponding to the five lines of McIlroy’s program. If, however, you have come up with a different way to deal with this, it’s fine to have more or fewer rows; just be clear on what you’re doing.

The example above said that it strengthens “whatever per-line structure the input had”. Why is the phrase “per-line” in there?