The Shell Game

Sample Lecture Material

Getting Started

The best way to learn about shells and the commands available for use in shells is to sit next to someone who knows a lot more about shells than you do and just happens to be in an expansive mood. If you can't find someone of that sort, you can follow along with me as I demonstrate various shell commands and provide a tour of a set of files and directories that we've created for illustration purposes. You can download these files and directories by clicking here. If you're just starting out and need some help setting up your computer or learning the basics of invoking shells, I recommend that you visit the course home page and work through the introductory material you'll find there. I'll be using the C shell (actually an enhanced version of the C shell with file-name completion and command-line editing, but you won't notice me using these capabilities in this static web page). The material in this lecture is the sort that you'll find in an introductory tutorial on using shells and shell tools.

Depending on what browser you're using, upon clicking on the here text, a file named mkdirs.tar may appear on your desktop, you may be asked where you want to store the file mkdirs.tar, or a folder (directory) named talk may suddenly appear somewhere on your desktop. Using a browser to download files is convenient in some ways (for example, it only requires a single click) but awkward in others ways (for example, you have to figure where the browser put the downloaded files, possibly use an extraction utility if you don't have your browser set up to do so automatically, and then move the files where you want them). I recommend that you don't bother using a browser at all. If you're reasonably experienced using a shell, you can download files by executing commands in your shell. Since this is an introduction to using shells, we might as well jump in and start using shell commands.

In the next exchange, I use pwd to determine my current working directory, curl (for ``copy URL'') to download the file mkdirs.tar, and tar (for the somewhat dated ``tape archive'') to extract the contents of mkdirs.tar. I'll explain each of the commands shown below in detail in class. /u/tld/ is my home directory and I'm using a percent sign (%) followed by a space as my prompt (in the next lecture I'll tell you how to change your prompt):

% pwd
/u/tld
% curl www.cs.brown.edu/~tld/talk/downloads/mkdirs.tar > mkdirs.tar
  % Total    % Received % Xferd  Average Speed          Time             Curr.
                                 Dload  Upload Total    Current  Left    Speed
100 20480  100 20480    0     0  59362      0  0:00:00  0:00:00  0:00:00  188k
% ls mkdirs.tar
mkdirs.tar
% tar -xvf mkdirs.tar
talk/
talk/data/
talk/data/web/
talk/data/web/519/
talk/data/web/544/
talk/data/web/544/d4.lidx
talk/data/web/544/d4.lmat
talk/data/web/544/d4.widx
talk/data/web/544/d4.wmat
talk/data/web/README
talk/email/
talk/email/archive/
talk/email/archive/00/
talk/email/archive/01/
talk/email/archive/02/
talk/email/archive/03/
talk/email/archive/99/
talk/email/INBOX
talk/email/OUTBOX
talk/music/
talk/papers/
talk/papers/archive/
talk/papers/journals/
talk/papers/proposals/

I used the verbose option (the v in -xvf) to tell tar to list the contents of the archive as it's extracting the files and directories. Now let's move around in those directories and examine their contents. In the next exchange, I use chdir (or equivalently cd in most shells) to change my working directory, ls to list the contents of one or more directories, the default being to list the contents of the current working directory, and cat to list the contents of files. As mentioned in class, ., .. and ~ are abbreviations for, respectively, your current working directory, the parent directory of your current working directory and your home directory.

% chdir talk
/u/tld/talk % ls
data   email  music  papers
/u/tld/talk % chdir email
/u/tld/talk/email % ls
INBOX   OUTBOX  archive
/u/tld/talk/email % cat INBOX
Mail-from: From good_fellow@work.org  Mon Jul 14 10:41:17 2003
Return-Path: <good_fellow@work.org>
Delivered-To: tld@cs.brown.edu
Received: from salt.cs.brown.edu (salt-dmz [128.148.32.122])
        by null.cs.brown.edu (Postfix) with ESMTP
        id 53B963F3A; Mon, 14 Jul 2003 10:41:17 -0400 (EDT)
Received: from work.org (mail.work.org [198.94.212.147])
        by salt.cs.brown.edu (Postfix) with ESMTP
        id 03B94BE3A; Mon, 14 Jul 2003 10:40:16 -0400 (EDT)
Mime-Version: 1.0
Message-Id: <a052106a2bb386a5cf73d@[198.94.212.146]>
Date: Mon, 14 Jul 2003 07:34:10 -0700
To: discussion@work.org
From: good_fellow@work.org>
Subject: Discussion
Content-Type: text/plain; charset="us-ascii" ; format="flowed"
X-Spam-Status: No, hits=0.0 required=5.0 tests= version=2.20

Dear ...

Some of you may remember ... In order to participate in this
discussion, however, you will need to subscribe to ...  Look
forward to talking with you.

Sincerely,

Good Fellow

% cd ..
% pwd
/u/tld/talk 
% ls
data   email  music  papers
% cd papers
% pwd
/u/tld/talk/papers 
% ls
archive   journals  proposals
% cd ..
% ls
data   email  music  papers
% cd data
% pwd
/u/tld/talk/data 
% ls
web
% cd web
% ls
519    544    README
% cat README
WT10g TREC data files with extensions:

.lidx = link index
.lmat = link matrix
.widx = dictionary
.wmat = term counts

% cd 544
% pwd 
/u/tld/talk/data/web/544 
% ls
d4.lidx d4.lmat d4.widx d4.wmat
% cat d4.widx
         0   11376535 stop_word
         1       4089 biology
         2     101970 news
         3       1825 bureau
         4      54203 cancer
         5       6400 release
         6      24078 high
         7       8060 bone
         8       4387 regular
         9       1125 density
        10       2898 double

Finally, we use mkdir to create a directory, rmdir to remove a directory, and rm to remove a file. In class and in the help sessions, we'll alert you to the dangers involved in using rm and rmdir and how to avoid being burned (as Unix system administrators like to say, ``rm doesn't delete files, people do'').

% cd ../../../papers
% pwd
/u/tld/talk/papers 
% ls
archive   journals  proposals
% cd /u/tld
% pwd
/u/tld
% mkdir junk
% ls talk > junk/talk
% cat junk/talk
data
email
music
papers
% rm junk/talk
% ls junk
% rmdir junk
% rm mkdirs.tar

The directory structure in the above examples is pretty simple and uncluttered. My own directory structure resides on several machines and is much larger and more complicated; in some parts, it's pretty organized but in the parts where I'm working it can get pretty messy. The directories arrayed under your home directory are where you keep all your stuff. You'll have to figure out a strategy for organizing your stuff that fits your working habits.

I remember a movie in which one of the characters was explaining why he liked frozen dinners. He really liked the fact that there were little compartments for the meat, mashed potatoes, carrots and peas. It was important to him that the gravy on the mashed potatoes didn't run over onto the vegetables. Some computer users are like this fellow; it's important to them that everything has its place. Others users keep a ``messy desktop'' with lots of obscurely named files all jumbled together. There is no perfect way of organizing your stuff, but the hierarchical structure of nested directories certainly suggests some simple and effective organizing principles. Think about how you might organize such information as music files, medical records, email archives and calendar data. And, when you do lose track of things, there are shell commands that will help you find them; we'll look at some of these commands in Chapter 3.

Memory and Variables

When you hear the word ``variable'' in the context of thinking about computers and programming, think of memory and remembering things. Variables identify pieces of memory and allow you store things, e.g., numbers, strings, etc., in those pieces of memory and later retrieve them. Before we launch into our discussion of variables, I want you to realize that you already know about another method of storing and retrieving things in memory. Files are a very useful mechanism for storing information. Here's a somewhat far fetched example for keeping track of single number, say, as a counter (I've reprogrammed the shell to print out my current working directory as part of my prompt to help you and me keep track of where we are in the directory structure):

/u/tld % echo 1 > num
/u/tld % cat num
1
/u/tld % rm -f num ; echo 2 > num
/u/tld % cat num
2

The -f (for ``force'') option to rm cancels the default behavior in which rm asks for confirmation before proceeding with removing a file. In the C shell, the semicolon allows you to specify a sequence of commands, so that command 1 ; command 2 ; ... ; command n executes each command in turn as if the commands were typed on separate lines.

Here's a somewhat more complicated example showing how simple text files can be used to keep track of more structured information over time. Imagine that it's Monday and in the process of doing whatever you're doing you create some new directories and files (the touch command creates one or more empty files with the name or names of the files specified as arguments):

/u/tld % mkdir talk
/u/tld % cd talk
/u/tld/talk % touch one.txt two.txt
/u/tld/talk % mkdir sub
/u/tld/talk % cd sub
/u/tld/talk/sub % touch three.txt

At the end of day, you want to take a look at all the files and directories that you've created during the day. You can use the ls command with the -R (for ``recursive'', a word we'll encounter often in this class) to list all the files in talk, in its subdirectories, in their subdirectories, and so on:

/u/tld/talk/sub % cd
/u/tld % ls -R talk
talk:
one.txt  sub  two.txt

talk/sub:
three.txt

How might we set that information aside in memory as a snapshot of the work you did on Monday? We'll simply redirect the output of ls into a file:

/u/tld % ls -R talk > monday

Now, suppose it's Tuesday and in the midst of working you create a new file:

/u/tld % cd talk/sub
/u/tld/talk/sub % touch four.txt

Again, at the end of the day, you create a snapshot of the talk directory and its contents:

/u/tld/talk/sub % cd
/u/tld % ls -R talk > tuesday

Now you can compare Monday's and Tuesday's snapshots using a nifty little program called diff that's used to compare files:

/u/tld % diff monday tuesday
6a7
> four.txt

Note that diff prints out the lines where the two files disagree -- a very useful command for all sorts of text handling jobs. The point of this little exercise is that files are a convenient form of memory for all sorts of applications. In the following lecture as we talk about variables, think about why files would or wouldn't serve just as well in the applications we consider.

Programs are said to assign values to variables and to reference the value of variables. Roughly speaking, a variable (or variable name) in the syntax of the C-shell is a string of alphanumeric characters beginning with a letter. You refer to the value of a variable in a program by appending the dollar sign character to front of the variable; here we reference two of several variables that are defined in the C-shell:

/u/tld % echo $shell
/usr/local/bin/tcsh
/u/tld % echo $prompt
/u/tld %
/u/tld %

The example involving the prompt variable is potentially confusing given that the value of the variable looks exactly the same as the actual prompt that's displayed by the shell. Some variables, like prompt, are used by the shell to record preferences. By modifying these variables, we can adjust our preferences:

/u/tld % set prompt = ":= "
:= echo "my prompt looks like: $prompt"
my prompt looks like: :=  
:= set prompt = "% "
% echo "my prompt looks like: $prompt"
my prompt looks like: % 

You typically assign a value to a variable by ``setting the variable's value'' or ``setting the variable equal to a particular value.'' This last way of putting it fits the syntax of the C-shell very nicely. Here are some examples using set to set variables and echo to inspect the results:

% set myshell = "conch"
% echo $myshell
conch

Note that we place a dollar sign in front of the name of a variable if we want to refer to its value, but not when setting the variable. Here are some more examples involving strings and lists of strings. If variable is the name of a variable and $variable refers to a list of items, then $variable[i] is the way that you would refer to the ith item in the list (i is called the index or subscript of the item).

% set word = one
% echo $word
one
% echo "the first word is $word"
the first word is one
% set words = (one two three)
% echo $words
one two three
% echo $words[2]
two
% set words[1] = uno
% set words[3] = tres
% echo $words
uno two tres
% set i = 2
% set words[$i] = dos
% echo $words
uno dos tres

Conditionals and Iteration

The C shell supports C-like syntax for conditional statements and conditional expressions. Note that the redirection operator > has a different meaning in the context of a conditional statement. In the context of a conditional statement, the operators > (greater than), < (less than), <= (less than or equal to), and >= (greater than or equal) correspond to programs that compute the truth or falsity of the corresponding mathematical relationships.

% set i = 3
% if ( $i > 5 ) echo "$i is greater than 5"
% if ( $i <= 5 ) echo "$i is less than or equal to 5"
3 is less than or equal to 5

There's also a somewhat more complicated syntax for conditionals with else clauses. The lines that comprise such statements must appear on multiple lines and the shell facilitates entering such multi-line statements by printing a special prompt to solicit additional input if required. Here's an example (the prompt printed by the shell is if?):

% if ( 1 > 2 ) then
if? echo 1
if? else echo 2
2

Consider the following exchange and observe that the shell doesn't make use of the special prompt:

% if ( 1 < 2 ) then
% echo 1
1

Why do you think the shell dispensed with the prompt in this case?

Conditional statements are useful for manipulating files. The expression -e name returns true if the file name exists and returns false otherwise. Note how ls complains if it can't find a file matching the file specification.

% touch junk.txt
% ls *.txt
junk.txt
% if ( -e junk.txt ) rm junk.txt 
% ls *.txt
ls: No match.
% if ( -e junk.txt ) ls *.txt
%

The C shell also supports various iterative constructs. Here's a simple loop as it would appear in a file implementing a csh shell script. The @ command supports simple numeric calculations involving integers; for example, @ sum = 1 + 4 results in 5 being assigned to the variable sum.

set number = 0
set words = (one two three)
foreach word ( $words )
    @ number ++ 
    echo "$number = $word"
    end

Analogous to the case involving multi-line conditionals, when you're typing a foreach command to the shell line by line, the shell issues a special prompt:

% set number = 0
% set words = (one two three)
% foreach word ( $words )
foreach? @ number ++ 
foreach? echo "$number = $word"
foreach? end
1 = one
2 = two
3 = three

There's also a while loop shown here as it it would appear in a file containing a shell script:

set i = 0
set result = 0
while ( $i < 100 )
    @ i ++
    @ result += $i
    end
echo $result

Here again, the shell prompts you while you're in the process of typing in a while loop:

% set i = 0
% set result = 0
% while ( $i < 100 )
while? @ i ++
while? @ result += $i
while? end
% echo $result
5050

Here's an alternative version of the program described in Chapter 1 for changing file extensions (I'll explain the syntax involving the backquote (`) and the basename command in class):

% touch one.html two.html
% ls
one.html two.html
% set from = "html"
% set to = "htm"
% set files = ( `ls *.$from` )
% echo $files
one.html two.html
% foreach file ( $files )
foreach? set base = `basename $file .$from`
foreach? mv $base.$from $base.$to
foreach? end
% ls
one.htm two.htm

It's somewhat unusual to write multi-line C-shell scripts interactively as I did above. It's much more common to use an editor to write such a script, save it in a file and then invoke the script in a shell. You'll learn how to do this in Exercise .

Now I'm going to throw you a curve; so get ready for a puzzler. The next code fragment corresponds to a shell script that performs a service similar to ls -R. I've saved this script as ~/bin/myls and I can invoke it as csh ~/bin/myls (recall that ~ is an abbreviation for my home directory). I want you to tell me why (and how) you think csh ~/bin/myls and ls -R perform similar services.

set start = `pwd`
foreach entry ( `ls` )
  if ( -f $entry ) echo $entry
  if ( -d $entry ) then
    cd $entry
    csh ~/bin/myls
    cd $start
  endif
end

Most of the shell commands that we've encountered allow you to supply or ``pass'' arguments to the command. Some commands take a specific number of arguments, e.g., diff expects two arguments corresponding to the names of the two files you want to compare. Other commands accept any number of arguments, e.g., ls and echo. You can also write shell scripts that accept arguments using so-called positional parameters. Within a shell script, you can refer to the first argument appearing on the command line using the variable $1, the second argument using the variable $2, and so on. The variable $0 refers to the name of the command itself and $# refers to the number of arguments. This next script uses positional parameters to provide a more elegant and useful variant of myls. I want you to tell me why you think it's more useful. Could I have simplified the script further using a single if-then-else conditional?

foreach entry ( `ls $1` )
  if ( -f $1/$entry ) echo $1/$entry
  if ( -d $1/$entry ) csh ~/bin/myls $1/$entry
end

There's a lot more you can find out about shell variables, conditional and various iterative constructs by using man tcsh or info tcsh, but this brief introduction will more than suffice for everything we cover in the book.

Manipulating Audio Files

Computers have revolutionized just about every aspect of the music industry. Digital audio formats provide incredible fidelity and allow for the noise-free reproduction and transfer of audio data. Operations and special effects that once required expensive analog electronics can now be carried out on relatively cheap personal computers using open-source software libraries that you can download for free. In this section, I'll introduce you to some of the libraries and commands that are available on the department Linux computers and that you can download and install on your own computer. I should mention however that while the computers in the various labs of the computer science department have the hardware necessary to convert digital audio data into analog signals of the sort used to produce sound, they don't have any speakers capable of actually producing sound (imagine the cacophony that would ensue if everyone was blasting out their favorite music) and so you'll have to bring headphones if you want to experiment with manipulating audio files.

The mpg123 command handles audio files stored in MPEG 1.0/2.0 (layers 1, 2 and 3) format. You're probably familiar with MPEG layer 3 files: the ubiquitous MP3 format. MP3 files (and MPEG files more generally) have to be decoded before they can be played; mpg123 enables you to convert an MPEG audio file to a format that can be more readily handled by other command-line tools. The -s options pipes the output to the standard output as raw (headerless) linear PCM audio data, 16 bit, stereo, host byte order. The -w option converts the file to WAV format (used by Microsoft) and writes it to a specified file or sends it to the standard output (use - instead of providing a file name). This command also supports some pretty interesting effects, e.g., you can speed up the sound (-d n) by only playing every nth sample or slow it down (-h n) by playing each sample n times. You can learn more about mpg123 on its web site.

If you want to get more serious about manipulating sound files, learn about sox (for Sound Exchange), a command line tool that provides lots of audio filters, special effects, mixers and synthesizers to play with. You can use sox to reverse the order of the samples in an audio file so you can play it backward (particularly useful when searching for satanic verses hidden in popular songs), add an echo, pan the direction of the sound from one channel of stereo to the other, shift the pitch and add a Fender Vibro-Champ sound effect (whatever that is) along with a host of other effects. Another useful sound library, called Enlightened Sound Daemon provides tools for playing sounds.9

Here are some examples using these audio libraries. I already grabbed a couple MP3 files to experiment with:

% ls *.mp3
bo_diddley.mp3 james_brown.mp3

I'll decode one of the MP3 files and save it in WAV format:

% mpg123 --wav bo_didley.wav bo_diddley.mp3 

I can slow the sound down by repeating each sample some number of times in the generated WAV file. The resulting audio file sounds really slooooow:

% mpg123 --wav bo_slowly.wav --halfspeed 8 bo_diddley.mp3 

Here are a couple of gratuitous conditional statements:

% if ( -e bo_diddley.mp3 ) echo "Hey Bo Diddley!"
Hey Bo Diddley!
% if ( ! -e wayne_newton.mp3 ) echo "Yo! No Wayne."
"Yo! No Wayne."

Next, I'll create some short sound clips using the sox command from the Sound Exchange library. This next invocation creates a five-second sound clip ignoring the first second of the WAV file that we created above:

sox bo_diddley.wav bo_forward.wav trim 0:00:01 0:00:06

Now we create another clip by reversing the last clip:

sox bo_forward.wav bo_backward.wav reverse

And finally we create a slow version of the first clip:

sox bo_forward.wav bo_slowly.wav speed 0.5

We'll put all three clips in a new directory:

% mkdir bo_tracks
% mv bo_forward.wav bo_backward.wav bo_slowly.wav bo_tracks
% cd bo_tracks

Next I start up the Enlightened Sound Daemon making it run in the background; then I use esdplay to play one of the clips:

% esd &
% if ( -e bo_diddley.wav ) esdplay bo_forward.wav

Here's a somewhat frivolous example using foreach to play each of the three clips in rapid succession:

% ls
bo_backward.wav   bo_forward.wav   bo_slowly.wav
% foreach file ( `ls` )
foreach? echo $file
foreach? esdplay $file
foreach? end
bo_backward.wav
bo_forward.wav
bo_slowly.wav

Well, clearly you didn't hear what I just did, but I can tell you that the three short clips executed one after another and produced a pleasingly strange juxtaposition of sounds. Use the web or the CD drive in your computer to download or extract one of your favorite songs and save it as an MP3 file. Then see what you can do to slice and dice it beyond all recognition; create a completely different piece of music by altering and rearranging the original to suit your own peculiar taste.

Example Exercises

Writing Shell Scripts

In this exercise, you'll use an editor to write and modify shell scripts. I suggest that you use Emacs as your editor. Emacs has extensive online help available and you'd be well advised to spend some time working through the Emacs tutorial, at least to the point where you know how to open a buffer on an existing file, create a new buffer, save a buffer to file, and move back and forth between buffers. To start the tutorial, fire up Emacs by typing emacs or xemacs to the shell. Emacs should take over your terminal window and display a greeting, or, alternately, a new window will open running Emacs. Hit the escape key (usually in the upper left hand corner of the keyboard and typically labeled esc) then the x key. A M-x prompt should appear at the bottom left of the window with the cursor positioned awaiting your input. Type help-with-tutorial, hit the return key and follow the instructions. One advantage of learning Emacs is that several other tools, e.g., info and tcsh, accept a subset of Emacs commands for navigating in documentation pages and command-line editing.

Multi-line shell scripts consist of shell commands appearing in a file almost exactly as you would type them to the shell. You can set and reference shell variables, change your working directory, invoke complex commands with options and arguments, write loops and conditional statements, and generally include anything that you would type when interacting directly with the shell. You invoke a shell script in a file by submitting the file to a shell using one of several approaches. I put together a couple of scripts for this exercise. Download the files, create a new directory (I called mine scripts), move the files to that directory and then use chdir to change your working directory to be your newly created directory. We'll begin by having you write your first shell script. Here's what it would look like if you actually typed the script to the shell line by line:

% chdir scripts
% ls
average sumorial 
% foreach file ( `ls` )
foreach? echo $file
foreach? end
average
sumorial

  1. Now use Emacs to create a file called myls in which you type the lines corresponding to the foreach loop just as I did above. Make sure that you type a return after the last line (the end statement). Don't include the prompts % or foreach?. Save the file in your current working directory (the one that you created for this exercise) and you should be able to observe the following:

    % cat myls
    foreach file ( `ls` )
      echo $file
    end
    % csh myls
    average
    myls
    sumorial
    

    The command csh myls invokes a new shell telling it to execute the commands in myls line by line. You can dispense with explicitly invoking a shell each time you want to execute a shell script by writing #!/bin/csh as the first line of your file thereby telling the shell to use /bin/csh to execute the commands in the file. Go ahead and add the line now. There's one more step to make this work however: you have to make the file executable using chmod.

    % ls -l myls
    -rw-r--r--  1 tld  fac  0 Sep 18 07:15 myls
    % chmod +x myls
    -rwxr-xr-x  1 tld  fac  0 Sep 18 07:16 myls
    % cat myls
    #!/bin/csh
    foreach file ( `ls` )
      echo $file
    end
    % myls
    average
    myls
    sumorial
    

  2. Now let's experiment with one of the scripts that you downloaded. The sumorial script is supposed to sum the numbers 1 through n inclusive where n is an integer supplied as an argument to the shell. This script demonstrates how to refer to arguments specified on the command line. $argv[i] references the ith argument appearing on the command line to the right of the command being invoked (you can also use $i which is somewhat more concise). Let's see how sumorial works.

    % sumorial 3
    7
    

    That's not right! Our script doesn't appear to work correctly. Take a look at the file containing the script, figure out what's wrong and then fix the bug. Your corrected version should work as follows:

    % sumorial 3
    6
    % sumorial 100
    5050 
    

  3. Now I want you to use sumorial as the basis for a new script factorial which computes the product of 1 through n inclusive. Don't forget to use chmod +x factorial to make factorial executable. Your script should work as follows:

    % factorial 6
    720
    % factorial 10
    3628800
    

  4. If you try factorial 100 you'll get an unexpected result. Explain what's going on.

  5. Now I want you to use sumorial as the basis for yet another script sumorial2 which takes two integer arguments n and m and computes the sum of the integers from n to m inclusive. Your script should work as follows:

    % sumorial2 2 6
    14
    % sumorial2 50 100
    3825
    

  6. What does your script return when n is greater than m? What do you think it should return? If necessary modify your script so that sumorial2 returns 0 if n is greater than m.

  7. Modify sumorial so it will accept either one or two integer arguments. If only one argument is supplied it should behave exactly as before, but if two arguments are supplied it should behave like sumorial2. If var is a shell variable, then $#var is 1 if var is set to a simple string value and n if var is set to a list of length n. Use $#argv == 2 in a conditional statement (if-then expression) to implement your multi-functional version of sumorial.

  8. The shell script average which you downloaded computes the average (rounded down to the nearest integer) of the integers specified on the command line.

    % average 8 9 10 11 12
    10
    

    Take a look at the file average and notice that it contains comments which are lines beginning with #. The shell ignores these lines when executing a shell script. Comments are used to explain what's going on in a program and are useful not only to others reading the code, but also to the original author returning at some later point to modify the code or recall what it actually does. Add comments to your version of sumorial which takes either one or two arguments. Show your commented code with a friend and see if he or she can understand your code does by reading the comments.

  9. Write a new script median which computes the median (rounded down to the nearest integer) of a list of integers specified on the command line. The median of an odd number of integers is the middle integer if the integers are listed in numeric order, e.g., the median of (2 3 5 9 11) is 5. The median of an even number of integers is the average of the two middle numbers, e.g., the median of (2 3 9 17) is 6. You can assume that the numbers on the command line are listed in ascending numeric order. You might find it convenient to make use of the % operator; n % m is the remainder after dividing n by m.

    % @ r = 9 % 2
    % echo $r
    1
    

    Make sure that you comment your code liberally. Also make sure that your script does something appropriate when no arguments are supplied.

  10. Shell scripts saved to files can be used just like any other commands. They can even be used in other shell scripts. You can always specify the absolute path name of the script in order to invoke it on the command line or in another script. Typically, however, programmers put frequently-used shell scripts in the bin directory in their home directory; so, for example, I keep various programs and shell scripts in ~tld/bin/ which I can refer to simply as ~/bin/. Create a bin directory to store your shell scripts. You can also cause the shell to automatically look in your bin directory by adding ~/bin/ to your path variable. Use echo $path to find out what your shell path is and see if you can figure out how to modify your .cshrc file (look in ~/.cshrc) so that ~/bin/ is in your path.

Ciphers and Secrets

Many of the shell commands were designed to operate on text files. This makes perfect sense given that they were developed by programmers to manipulate programs, keep track of files, processes and users, and otherwise deal with items that have a textual representation. In this exercise, we'll use some of these commands to explore applications in cryptography (communicating with or deciphering secret codes) and information retrieval (organizing and searching in information repositories).

One method to protect your personal information from prying eyes is to encrypt it using a secret code or cipher. And one of the simplest encryption methods involves the use of a substitution code. With this method, each letter is substituted for another thereby scrambling the message. For example, given the code in which each letter is mapped to the previous letter in the sequence of the alphabet except A which is mapped to Z, the word IBM would be encoded as HAL.

Samuel Pepys used a variant of this method to encrypt his journals which he kept for nearly a decade starting on January 1, 1660. The full text of Samuel Pepys journals is available from Project Gutenberg. There are also web sites devoted to the discussion of Pepys, his journals and the times in which he lived; one such site makes use of an interesting publishing format called Movable Type that allows visitors to annotate the site's web pages with their comments and observations.

For your first exercise, use info or man to learn about the transliteration command tr. Use this command to implement a method of encoding and decoding messages using a particular substitution code. Pick a substitution code that you think will be particularly difficult to break. To make this a little simpler, assume that the text to be encoded consists of lower-case alphabetic characters and spaces but no punctuation or upper-case letters.

% echo "abracadabra" | tr 'abc' 'cab' | tr 'cab' 'abc'

If you read about substitution codes, you'll learn that they are subject to being broken by so-called frequency analysis. One tool that often comes in handy in frequency analysis is the character histogram. Here's how to implement a very simple character histogram tool using the commands sort and uniq. As an extension, you might find it convenient to sort the output of this script so that the characters are listed ordered by frequency.

% cat characters.txt
a
a
b
c
c
c
z
% cat characters.txt | sort | uniq -c 
   2 a
   1 b
   3 c
   1 z

This assumes that each character comprising the words in a document is on a separate line. Figure out how to use sed and tr to convert a document to this format so you can produce a character histogram for the entire document.

The trick in frequency analysis is to identify which letters get mapped by identifying their frequency signatures in similar texts. For example, the frequency of the letter A should be approximately the same in all documents assuming that the documents are in the same language and the authors use the same vocabulary.

Grab an entry from Pepy's diary and encrypt it using your substitution code from above. Exchange your encrypted entry with a friend but don't reveal the date of the entry. Now produce the character histogram for another entry and then compare it with the histogram for the encrypted entry. See if you can figure out enough of the substitutions to make the encrypted text readable. It will probably help if you build a library of simple shell scripts to use in analyzing codes and encoding and decoding texts. In class, we'll tell you how to build such a library.

Here are some additional exercises that deal with text and information retrieval. You might want to try these exercises first before starting on the cryptography exercise.

  1. How would you create a tac command? Like cat but lists the contents of files in reverse.

  2. In spell checking HTML documents, spell checkers ignore HTML tags. Write an html-to-text converter using grep and sed that strips off HTML tags leaving just the text that appears when the file is opened in a browser.

  3. What does spell checking have to do with the way humans speak or type? How would you design a spelling checker? Search the Web to find out about the metaphone algorithm and provide a short description of how it works.

  4. Check out the open-source aspell spell checker at Source Forge and summarize how it works by scanning the documentation.

  5. Check out the dictionaries available on your system. On Unix systems, you'll often find one or more dictionaries in /usr/share/dict/. Here's how you would list all the words in Webster's 2nd Edition that contain ``happy''.

    % grep "happy" /usr/share/dict/web2
    

    Figure out how to use grep to print out the line in the dictionary for ``happy'' if it exists and do nothing otherwise. Your solution shouldn't print out the line for ``unhappy.''

  6. The comm command is used to compare files; here's an excerpt from the man page:

       The 'comm' utility reads file 1 and file 2 (these files should 
       be sorted lexically), and produces three text columns as output:
       lines only in file 1; lines only in file 2; lines in both files.
    
       The specification "-" indicates the standard input.
    
       The following options are available:
    
       -1  Suppress printing of column 1.
    
       -2  Suppress printing of column 2.
    
       -3  Suppress printing of column 3.
    

    What does the following script do?

    % comm -12 file_1 file_1
    

    How about this next script?

    % comm -23 file_1 file_2 
    

  7. How would you list all the words in a document that are in Webster's 2nd Edition? Think about converting the document to a file with one word per line and then using the comm command to compare this file with the dictionary file.

    Note that comm relies on its inputs being sorted; you can use the sort command to handle a wide range of sorting tasks.

  8. Use info sed to figure out what the following script does:10

    % echo "ward" | sed 's/\([a-z]\)/\1\@/g' | tr '\@' '\n'
    

  9. The next script finds words whose reversed letters also comprise a word. Explain how it does this.

    % echo "ward" | rev | comm -12 - /usr/share/dict/words
    

    % rev /usr/share/dict/words | sort | comm -12 - /usr/share/dict/words
    

  10. Outline the design of a script that finds palindromes.11

  11. What does the following script do?

    % echo "word" | sed 's/\([a-z]\)\([a-z]\)/\2\1/g' 
    

  12. Many misspelled words result from accidently typing one letter before another; sometimes your fingers get ahead of one another. Describe the design of a spell corrector that looks for pairs of reversed letters.

  13. The directory names I used for five out of the first six exercises on this web site (treasure, wordplay, findtime, password and database) just happen to be eight characters long each. Do you think this was a coincidence? How would you calculate how likely such an event is assuming that I chose randomly?

Files, Dates and Times

These exercises focus on searching for files in directories and dealing with time. We'll start by introducing the find command. The documentation for find (run info find) states that find ``recursively descends directory trees'' executing a user-supplied program at each file (and directory) in the directory tree. You should already understand the idea of a file system defining a tree structure from reading Chapter 1. The first exercise will test your understanding.

  1. Use the find command to enumerate all the files in a subtree of the file system of your choosing. The subtree must be at least two directories deep, the immediate subdirectory of the root of your chosen subtree must contain at least two directories and two files and each of those must contain at least two subdirectories and two files. You might just find it easiest to create such a subtree using mkdir and touch. Assuming that the root of your subtree is named root, run the following command:

    % find root -print 
    

    Describe the order in which find searches the tree. There is an option to find that alters the order in which directories are searched. Read about this option in the online documentation and indicate specific differences between the default behavior of find and the optional behavior on the directory subtree that you chose (or created).

  2. Use the find command to find and delete all files that match *.bak and have an access time earlier than Jan 1, 2000. Use info to learn about the differences among the different kinds of time used to describe files: last access (the last time someone executed a command on the file), last status change (the last time someone changed one of the file's attributes) and last modification (the last time someone modified the file's contents). As a hint, the following invocations show how to use touch and ls to create one file with a Jan 1, 2000 access time, a second file with a Jan 1, 1999 access time, and then how to list both files displaying their access times. You may want to look into how to negate a find expression.

    Create a file named 2000 with access time Jan 1, 2000 00:00

    % touch -a -t 200001010000 2000
    

    Create a test file test.bak with access time Jan 1, 1999 00:00

    % touch -a -t 199901010000 test.bak
    

    List with -u (use access time) and -l (list long) options

    % ls -u -l 2000 test.bak
    -rw-r--r--  1 tld  staff  0 Jan  1  2000 2000
    -rw-r--r--  1 tld  staff  0 Jan  1  1999 test.bak
    

  3. Use find and grep together to find all files corresponding to LaTeX documents (LaTeX files have the extension tex) that include the word ``cryptanalyst''. Print only the names of the files. If you want a more difficult challenge, print out the names of only those LaTeX files that include the word ``cryptanalyst'' within the first ten lines of the beginning of the file.

  4. Use touch and date to create a new file with the name MmmDD.tar and then mkdir to create a new directory YY in which to store the file, where Mmm is a three-letter month abbreviation, DD is the two digit day of the month (01-31) and YY is the last two (rightmost) digits of the year. For example, if the current date is August 30, 2004, the new directory should have the name 04 and the file Aug30.tar. The following examples show off some of what date can do using the + formatting option:

    % set today=`date`         # Mon Aug 30 17:52:07 EDT 2004
    % set month`date +%h`      # Aug
    % set day=`date +%d`       # 30
    % set hour=`date +%H`      # 17
    % set minute=`date +%M`    # 52
    % set year=`date +%Y`      # 2004
    % set iso=`date +%Y-%m-%d` # 2004-08-30 - ISO-8601 date format
    

  5. Use touch and date to write a one-line script that creates a file past.txt with an access time approximately one year earlier than the current date. Check out info csh to learn about how to perform numeric calculations when assigning values to variables. Be explicit about what sort of approximation your solution provides.

  6. The date command with the -r option followed by an integer specifying seconds prints out the date that is the specified seconds from the epoch. What is the epoch? Use the date command to find the actual epoch date and search the web to find out its significance.

  7. There are other shell commands besides date that are used to produce formatted output. The printf command derives its name from a function in the C programming language that is used to format (hence the f) output. For example, printf allows you to control how numbers are displayed, e.g., the number of digits to the right of the decimal point (called the precision) and to arrange multicolumn output by controlling the width of the fields used to display each column. And, while echo always prints a newline, with printf you can control where you want to place newlines by incorporating newlines characters directly in the format string, e.g., printf "\n\n$var\n\n" prints the value of $var preceded and followed by a blank line. printf will come in handy for more complicated shell scripts. If you're not interested in fancy formatting, you're better off using echo to produce output. Here are some examples showing what you can do with printf:

    % printf "%10s%10s%10s\n" "Median" "Minimum" "Maximum"
        Median   Minimum   Maximum
    % printf "%10.2f%10.2f%10.2f\n" 17.15637 15.983 21.7849
         17.16     15.98     21.78
    % foreach row ( 2.718281828 3.14159265 17 )
    foreach? printf "%10.2f\n" $row
    foreach? end
          2.72
          3.14
         17.00
    

    Use printf to produce the following output where you supply the format string:

    % printf "format string" 2.718281828 3.14159265 
                 2.718                  3.14
    % printf "format string" "E" 2.718281828 "Pi" 3.14159265
             E = 2.7183            Pi = 3.142     
    % printf "format string" "E" 2.718281828 "Pi" 3.14159265
             E = 00002.7183        Pi = 000003.142
    

  8. There are functions in the standard C library (libc) called time2posix and posix2time that convert seconds since the Epoch to and from a standard date and time format. How might you use these functions with the date command to specify an offset from the current time?

Code Fragments

Download all the code fragments in this chapter as a file in zip format.


9 The esd command starts a daemon. Typically you start the daemon running in the background by typing esd

(the ampersand causes the process to run in the background) and then use esdplay to play audio files. The esdplay command can handle several different audio formats including WAV files.

10 The sed utility is used to edit lines of text. It's a pretty complex tool, but here's a simple and very useful way in which it's often used: sed 's/pattern/replacement/g' which replaces (the ``s'' denotes substitution) each (the ``g'' denotes ``global'') occurrence of the pattern (a regular expression) with the replacement string.

11 A palindrome is a word, phrase, verse, or sentence that reads the same backward or forward. For example: A man, a plan, a canal, Panama.