6 The Shell
6.1 What Is This Assignment’s Purpose?
The Unix shell is one of the most widely-used programming languages, and has a powerful notion of laziness central to its evaluation model. In this assignment we want you to get the experience of “retrofitting” a type system onto a dynamic language, by finding invariants in the way data are transformed.
6.2 Reading
Read this paper, at least to the end of section 2.1.Local interest stories: first-author Greenberg is a Brown CS alum, and third author Vasilakis is now a professor here.
The paper references this famous exchange in the journal Communications of the ACM, where Donald Knuth presented a “literate” program for a task and then Doug McIlroy wrote a response in which he showed the same program written in terms of Unix shell commands. McIlroy’s review begins on page 478 and his solution is on page 480. You are not expected to read this document in whole! It’s here as a historical reference (and a fascinating contrast between different styles of programming, and also a statement of the power of programming languages). But it is very useful to read the critique starting on page 480 starting from “Knuth’s purpose was” to the end of page 481.Here’s a cartoon about it. And here’s a classic video about the Unix command philosophy from 1982.
If you have not seen the shell before, these slides do a very nice job of introducing it (especially through slide 41). Note that slide 4 of that deck echoes the Knuth-McIlroy exchange.
6.3 Program
tr -cs A-Za-z ' |
' | |
tr A-Z a-z | |
sort | |
uniq -c | |
sort -rn |
cat <filename> | |
tr -cs A-Za-z ' |
' | |
tr A-Z a-z | |
sort | |
uniq -c | |
sort -rn | |
less |
Note that it is important that you have a newline inside the quote: see McIlroy’s explanation of the program.
6.4 Task
Let us now think about this from the perspective of retrofitted type systems.
One of the reasons the Unix shell is so powerful is because its commands compose very easily: to a first approximation, everything consumes a stream of strings and returns a stream of strings.If you pay really close attention below, you may notice that perhaps some things make more sense on finite lists than on infinite streams. But you can ignore this distinction unless you really want to. However, this perspective is useless from the perspective of debugging programs: it means there is only one type, and everything consumes and produces values of that type. That means “composition” is automatic, but we cannot tell with any more refinement what kind of answer a program would produce.
But obviously, that is not what is going on here! Each line is doing
something interesting and producing a much richer kind of
stream-of-strings—
(1) tr -cs A-Za-z ' |
' | |
(2) tr A-Z a-z | |
(3) sort | |
(4) uniq -c | |
(5) sort -rn |
Each line clearly consumes and produces strings. However, it doesn’t necessarily consume or produce strings without any structure at all. What new structure is gained at each point? How does this accumulation of structure help us determine that this pipeline actually produces the value we desire?
From the point of view of the structure of the lines, assuming there was no structure to the input (beyond being a stream of strings), we know that in the output, the strings all contain "CSCI 1730".
From the perspective of grep:
It strengthens whatever per-line structure the input had with the constraint that every line contains the argument string.
If the input is of finite length, the output has no more lines, and may well have fewer lines, than the input.
We ask you to produce a table of the following form, for which we’ve given a sample entry:
Operation |
| Input Type |
| Output Type |
| Output Length |
| Output Order |
|
|
| (relative to input) |
| (relative to input) | |||
grep "CSCI 1730" |
| stream of strings |
| stream of strings that contain "CSCI 1730" |
| <= |
| same |
In writing your answer:
If you find it necessary or useful to add more columns to more thoroughly capture the behavior of these operations, feel free to do so!
The operations above are very general. You will find it easier to treat special cases: e.g., instead of uniq in general, treat it as if it were a specialized command, uniq-c; instead of sort in general, specialize to sort-rn; and so on.
Please be as concise and precise as possible.
6.5 What to Turn In
Please produce the table described above. By default, we suggest that it should have five rows, corresponding to the five lines of McIlroy’s program. If, however, you have come up with a different way to deal with this, it’s fine to have more or fewer rows; just be clear on what you’re doing.
The example above said that it strengthens “whatever per-line structure the input had”. Why is the phrase “per-line” in there?