0. Table of Contents
1. Basic Commands
2. Basic Operators
3. Regular Expressions
4. The 'awk' Utility
5. The 'grep' Utility
6. The 'sed' Utility
7. Additional Commands
1. Basic Commands
'chdir' change your current working directory
'chgrp' change the group of files or directories
'chmod' change the permissions of files or directories
'chown' change the ownership of files or directories
'cp' copy a file
'curl' a simple HTTP client that fetches files
'date' print the current date in various formats
'echo' print expanded arguments ending with a newline
'expand' convert tabs to spaces (unexpand - vice versa)
'ls' list directory contents
'mkdir' create directories
'mv' move or rename files and directories
'printf' print formatted output; doesn't automatically add newline
'pwd' print the current working directory
'sleep' sleep for a specified number of seconds
'touch' create files or change modification time of existing ones
2. Basic Operators
'|' pipe output of one command to input of another
% ls | wc -l
'>' redirect the standard output to a new file
% ls > file.txt
'<' direct contents of a file to the standard input
% grep html < file.html
'>>' append to the end of an existing file
% echo 'one more line' >> lines.txt
'&' run a command in the background
% find ~ -name "*.jpg -print > file.txt &
';' perform commands in sequence
% sleep 5 ; echo 1 ; ls
Note that the following three commands do exactly the same thing:
% grep html talking.html
% grep html < talking.html
% cat talking.html | grep html
3. Regular Expressions
Here is a summary of some of the most common regular expression
syntax available for use in 'sed' and 'grep' commands:
The meta characters '.', '^', '$', '[', and ']' are treated
specially by the shell.
. matches any character
^ matches the beginning-of-line anchor
$ matches the end-of-line anchor
The escape character backslash '\' causes any meta characters
to be treated literally.
\. matches a period
\$ matches a dollar sign
\^ matches a caret
A list of characters enclosed by [ and ] matches any single
character in that list; if the first character of the list is
the caret ^ then it matches any character not in the list.
Most meta characters lose their special meaning when enclosed
in a character list.
[.] matches a period
[a-zA-Z0-9] matches one alphanumeric character
[^a-zA-Z] matches one non alphabetic character
In the following, let RE correspond to any regular expression.
RE* matches the regular expression zero or more times
RE\{n\} matches the regular expression n times
RE\{n,\} matches the regular expression at least n times
RE\{n,m\} matches the regular expression at least n times
and no more than m times
\(RE\) matches the regular expression and saves the
string of matched characters in the replacement
variables \1, \2, etc. so that the first \( \)
pairing is saved in the variable \1, the second
pairing in \2 and so on.
Note that different commands, e.g., 'sed' and 'grep', employ
slightly different regular expression syntax. This is the
source of much confusion, and so, whenever possible, I try to
use generic syntax that works with any command.
Entering a tab character in a 'sed' regular expression is a
little tricky. Under bash or tcsh use CTRL-v CTRL-i or CTRL-q
CTRL-i. For additional tips on using 'sed' see the document
'Handy one-line sed programs' in ./oneliners.sed.
4. The 'awk' Utility
This is the grandfather of computational Swiss army knives. For
many purposes, other commands have have usurped 'awk's dominant
role, but there are certain idiomatic usages that come in handy.
The 'awk' utility, like 'sort', 'cut' and 'join', allows you to
specify fields in the strings corresponding to the lines in the
input. Fields are separated by the space character (the default)
or an alternative delimiter specified by the -F option. Here are
some of the ways I've made use of 'awk' in various scripts:
% cat file
1;Fred;Felicity
7;Alice;Allegory
4;Sally;Salacious
% cat file | awk 'BEGIN { srand } { print rand, $0 }'
0.041960 1;Fred;Felicity
0.223255 7;Alice;Allegory
0.252055 4;Sally;Salacious
Initialize the random number generator with a random seed (that's
the 'BEGIN { srand }' part) and then print each line ($0) in the
standard input preceded by a random number between 0 and 1.
% cat file | awk 'BEGIN { s = 0} { s += $1 } END { print s }'
12
Initializes the 's' to 0 (isn't strictly necessary), adds up the
values in the first field and prints out the resulting the sum.
Since we're only interested in the first field it isn't necessary
to specify an alternative delimiter.
% cat file | awk -F ";" '{ print $3 ", " $2 }'
Felicity, Fred
Allegory, Alice
Salacious, Sally
Using the semicolon as a delimiter, print out the third field, a
comma, space and then the second field for each line of the input.
5. The 'grep' Utility
The 'grep' utility is useful for searching the content of files
(or the standard input). 'egrep' ('extended' grep) has a more
powerful regular expression language (check out 'info grep') and
requires fewer escape ('\') characters in regular expressions.
% ls
sara.jpg
artemis.txt
michelle.jpg
vanessia.gif
damien.txt
% ls | egrep "[a-zA-Z]*[.][gG][iI][fF]|[a-zA-Z]*[.][jJ[pP][gG]"
sara.jpg
michelle.jpg
vanessia.gif
Match file names with an alphabetic base and an extension
corresponding to a GIF or JPEG image file. Basic regular
expression syntax is presented elsewhere, but note that a period
appearing inside a character list is treated literally (i.e.,
it only matches a period) whereas a period appearing elsewhere
(unless preceded by an escape) matches any character. Note
that the disjunctive operator (if 'RE1' and 'RE2' are regular
expressions then 'RE1|RE2' matches 'RE1' or 'RE1') is available
in the 'grep' utility but not in the 'sed' utility.
6. The 'sed' Utility
The 'sed' (for stream editor) utility like 'awk' has its own
powerful scripting language, but the substitution command
accounts for the lions share of the use of 'sed' in writing shell
scripts. The 'sed' command uses so-called the syntax of 'basic
regular expressions' (see 'info regex') which are similar, but
not exactly the same as 'grep' regular expressions. Here's the
prototypical use of the 'sed' utility:
sed 's/PATTERN/REPLACEMENT/g'
The 'sed' utility can also be used with the -f (for 'file) option
to specify a 'sed' program consisting of individual 'sed' commands
with one such command to a line as in:
% sed -f file
where 'file' is a file of 'sed' commands. For example,
% cat file
s/Fred/Mary/g
s/Sally/Bill/g
...
Here are some examples illustrating the 'sed' substitution command:
% set input = "Nathan Sequitur"
% echo $input | sed 's/\([a-zA-Z]*\)[ ]*\([a-zA-Z]*\)/\2, \1/g'
Sequitur, Nathan
Note that the following doesn't work as one might expect:
% set string = '<tr align="right">$2,359</tr>'
% echo $string | sed 's/<tr.*>/ @BEGIN_ROW@ /g'
@BEGIN_ROW@
This is because 'sed' matches 'greedily', that is to say a
regular expression like '.*' gobbles up as many characters
as it possibly can and still succeed in finding a match.
Here's a fix using the 'complement' operator '^'
% echo $string | sed 's/<tr[^>]*>/ @BEGIN_ROW@ /g'
@BEGIN_ROW@ $2,359</tr>
7. Additional Commands
The 'comm' command: compare two files
% comm -23 one.txt two.txt
List the lines that are in one.txt but not in two.txt. The
'comm' commands relies on the two files being sorted.
The 'cut' command: extract specified fields in a file
% cat artist
1;Bill;Frisell;1951-03-18;Baltimore, Maryland
2;Bonnie;Raitt;1949-11-08;Burbank, California
3;Melvin;Taylor;1959-03-13;Jackson, Mississippi
4;Robert;Cray;1953-08-01;Columbus, Georgia
5;Keith;Jarrett;1945-05-08;Allentown, Pennsylvania
6;Sue;Foley;1968-03-29;Ottawa, Canada
% cat artist | cut -f 3,4 -d ";"
Frisell;1951-03-18
Raitt;1949-11-08
Taylor;1959-03-13
Cray;1953-08-01
Jarrett;1945-05-08
Foley;1968-03-29
Extract fields 3 and 4 using the semicolon as a field delimiter.
The 'find' command: search the file system for specified files
% find ~ -name "*[a-z]*.???" -print
Find and print any file in the directory tree rooted in my home
directory whose name is lowercase alphabetic and whose extension
has exactly three characters.
% find . -name "*.jpg" -size +8 -exec /bin/rm {} \;
Find and delete every file in the directory tree rooted in my
current working directory with the extension 'jpg' whose size
exceeds (the '+') 8 * 512 bytes.
The 'join' command: join two files using a specified field
See the exercise 'Working With Databases' on the book web page.
The 'paste' command: combine the lines in two files side-by-side
% cat letters
a
b
c
% cat numbers
1
2
3
% paste numbers letters
1 a
2 b
3 c
Paste the two files side-by-side using the default delimiter tab.
% paste -d ";" numbers letters
1;a
2;b
3;c
Paste the two files side-by-side using the semicolon as delimiter.
The 'repeat' command: repeat n times the specified (simple) command
% repeat 3 echo 1
1
1
1
This turns out to be very useful for all sorts of scripting
tricks. For example, suppose that you want to initialize
an array (list) of a specified length to contain all zeros.
% set n = 16
% set array = ( `repeat $n echo 0` )
% echo $#array
16
% echo $array[7]
0
The 'sort' command: sort by lines or fields
% cat file | sort -rn
Sort the file in reverse numeric order. The
default is to sort in lexicographic order.
The 'tr' command: translate characters
% cat file | tr "A-Z" "a-z"
Convert all uppercase letters to lowercase.
% cat file | tr -dc "a-z \n"
Delete all characters other than spaces, line feeds
and lowercase alphabetic characters.
The 'uniq' command: count or remove consecutive duplicate lines
% sort file | uniq -c
Count the duplicate lines appearing in a file.