0. Table of Contents 1. Basic Commands 2. Basic Operators 3. Regular Expressions 4. The 'awk' Utility 5. The 'grep' Utility 6. The 'sed' Utility 7. Additional Commands 1. Basic Commands 'chdir' change your current working directory 'chgrp' change the group of files or directories 'chmod' change the permissions of files or directories 'chown' change the ownership of files or directories 'cp' copy a file 'curl' a simple HTTP client that fetches files 'date' print the current date in various formats 'echo' print expanded arguments ending with a newline 'expand' convert tabs to spaces (unexpand - vice versa) 'ls' list directory contents 'mkdir' create directories 'mv' move or rename files and directories 'printf' print formatted output; doesn't automatically add newline 'pwd' print the current working directory 'sleep' sleep for a specified number of seconds 'touch' create files or change modification time of existing ones 2. Basic Operators '|' pipe output of one command to input of another % ls | wc -l '>' redirect the standard output to a new file % ls > file.txt '<' direct contents of a file to the standard input % grep html < file.html '>>' append to the end of an existing file % echo 'one more line' >> lines.txt '&' run a command in the background % find ~ -name "*.jpg -print > file.txt & ';' perform commands in sequence % sleep 5 ; echo 1 ; ls Note that the following three commands do exactly the same thing: % grep html talking.html % grep html < talking.html % cat talking.html | grep html 3. Regular Expressions Here is a summary of some of the most common regular expression syntax available for use in 'sed' and 'grep' commands: The meta characters '.', '^', '$', '[', and ']' are treated specially by the shell. . matches any character ^ matches the beginning-of-line anchor $ matches the end-of-line anchor The escape character backslash '\' causes any meta characters to be treated literally. \. matches a period \$ matches a dollar sign \^ matches a caret A list of characters enclosed by [ and ] matches any single character in that list; if the first character of the list is the caret ^ then it matches any character not in the list. Most meta characters lose their special meaning when enclosed in a character list. [.] matches a period [a-zA-Z0-9] matches one alphanumeric character [^a-zA-Z] matches one non alphabetic character In the following, let RE correspond to any regular expression. RE* matches the regular expression zero or more times RE\{n\} matches the regular expression n times RE\{n,\} matches the regular expression at least n times RE\{n,m\} matches the regular expression at least n times and no more than m times \(RE\) matches the regular expression and saves the string of matched characters in the replacement variables \1, \2, etc. so that the first \( \) pairing is saved in the variable \1, the second pairing in \2 and so on. Note that different commands, e.g., 'sed' and 'grep', employ slightly different regular expression syntax. This is the source of much confusion, and so, whenever possible, I try to use generic syntax that works with any command. Entering a tab character in a 'sed' regular expression is a little tricky. Under bash or tcsh use CTRL-v CTRL-i or CTRL-q CTRL-i. For additional tips on using 'sed' see the document 'Handy one-line sed programs' in ./oneliners.sed. 4. The 'awk' Utility This is the grandfather of computational Swiss army knives. For many purposes, other commands have have usurped 'awk's dominant role, but there are certain idiomatic usages that come in handy. The 'awk' utility, like 'sort', 'cut' and 'join', allows you to specify fields in the strings corresponding to the lines in the input. Fields are separated by the space character (the default) or an alternative delimiter specified by the -F option. Here are some of the ways I've made use of 'awk' in various scripts: % cat file 1;Fred;Felicity 7;Alice;Allegory 4;Sally;Salacious % cat file | awk 'BEGIN { srand } { print rand, $0 }' 0.041960 1;Fred;Felicity 0.223255 7;Alice;Allegory 0.252055 4;Sally;Salacious Initialize the random number generator with a random seed (that's the 'BEGIN { srand }' part) and then print each line ($0) in the standard input preceded by a random number between 0 and 1. % cat file | awk 'BEGIN { s = 0} { s += $1 } END { print s }' 12 Initializes the 's' to 0 (isn't strictly necessary), adds up the values in the first field and prints out the resulting the sum. Since we're only interested in the first field it isn't necessary to specify an alternative delimiter. % cat file | awk -F ";" '{ print $3 ", " $2 }' Felicity, Fred Allegory, Alice Salacious, Sally Using the semicolon as a delimiter, print out the third field, a comma, space and then the second field for each line of the input. 5. The 'grep' Utility The 'grep' utility is useful for searching the content of files (or the standard input). 'egrep' ('extended' grep) has a more powerful regular expression language (check out 'info grep') and requires fewer escape ('\') characters in regular expressions. % ls sara.jpg artemis.txt michelle.jpg vanessia.gif damien.txt % ls | egrep "[a-zA-Z]*[.][gG][iI][fF]|[a-zA-Z]*[.][jJ[pP][gG]" sara.jpg michelle.jpg vanessia.gif Match file names with an alphabetic base and an extension corresponding to a GIF or JPEG image file. Basic regular expression syntax is presented elsewhere, but note that a period appearing inside a character list is treated literally (i.e., it only matches a period) whereas a period appearing elsewhere (unless preceded by an escape) matches any character. Note that the disjunctive operator (if 'RE1' and 'RE2' are regular expressions then 'RE1|RE2' matches 'RE1' or 'RE1') is available in the 'grep' utility but not in the 'sed' utility. 6. The 'sed' Utility The 'sed' (for stream editor) utility like 'awk' has its own powerful scripting language, but the substitution command accounts for the lions share of the use of 'sed' in writing shell scripts. The 'sed' command uses so-called the syntax of 'basic regular expressions' (see 'info regex') which are similar, but not exactly the same as 'grep' regular expressions. Here's the prototypical use of the 'sed' utility: sed 's/PATTERN/REPLACEMENT/g' The 'sed' utility can also be used with the -f (for 'file) option to specify a 'sed' program consisting of individual 'sed' commands with one such command to a line as in: % sed -f file where 'file' is a file of 'sed' commands. For example, % cat file s/Fred/Mary/g s/Sally/Bill/g ... Here are some examples illustrating the 'sed' substitution command: % set input = "Nathan Sequitur" % echo $input | sed 's/\([a-zA-Z]*\)[ ]*\([a-zA-Z]*\)/\2, \1/g' Sequitur, Nathan Note that the following doesn't work as one might expect: % set string = '<tr align="right">$2,359</tr>' % echo $string | sed 's/<tr.*>/ @BEGIN_ROW@ /g' @BEGIN_ROW@ This is because 'sed' matches 'greedily', that is to say a regular expression like '.*' gobbles up as many characters as it possibly can and still succeed in finding a match. Here's a fix using the 'complement' operator '^' % echo $string | sed 's/<tr[^>]*>/ @BEGIN_ROW@ /g' @BEGIN_ROW@ $2,359</tr> 7. Additional Commands The 'comm' command: compare two files % comm -23 one.txt two.txt List the lines that are in one.txt but not in two.txt. The 'comm' commands relies on the two files being sorted. The 'cut' command: extract specified fields in a file % cat artist 1;Bill;Frisell;1951-03-18;Baltimore, Maryland 2;Bonnie;Raitt;1949-11-08;Burbank, California 3;Melvin;Taylor;1959-03-13;Jackson, Mississippi 4;Robert;Cray;1953-08-01;Columbus, Georgia 5;Keith;Jarrett;1945-05-08;Allentown, Pennsylvania 6;Sue;Foley;1968-03-29;Ottawa, Canada % cat artist | cut -f 3,4 -d ";" Frisell;1951-03-18 Raitt;1949-11-08 Taylor;1959-03-13 Cray;1953-08-01 Jarrett;1945-05-08 Foley;1968-03-29 Extract fields 3 and 4 using the semicolon as a field delimiter. The 'find' command: search the file system for specified files % find ~ -name "*[a-z]*.???" -print Find and print any file in the directory tree rooted in my home directory whose name is lowercase alphabetic and whose extension has exactly three characters. % find . -name "*.jpg" -size +8 -exec /bin/rm {} \; Find and delete every file in the directory tree rooted in my current working directory with the extension 'jpg' whose size exceeds (the '+') 8 * 512 bytes. The 'join' command: join two files using a specified field See the exercise 'Working With Databases' on the book web page. The 'paste' command: combine the lines in two files side-by-side % cat letters a b c % cat numbers 1 2 3 % paste numbers letters 1 a 2 b 3 c Paste the two files side-by-side using the default delimiter tab. % paste -d ";" numbers letters 1;a 2;b 3;c Paste the two files side-by-side using the semicolon as delimiter. The 'repeat' command: repeat n times the specified (simple) command % repeat 3 echo 1 1 1 1 This turns out to be very useful for all sorts of scripting tricks. For example, suppose that you want to initialize an array (list) of a specified length to contain all zeros. % set n = 16 % set array = ( `repeat $n echo 0` ) % echo $#array 16 % echo $array[7] 0 The 'sort' command: sort by lines or fields % cat file | sort -rn Sort the file in reverse numeric order. The default is to sort in lexicographic order. The 'tr' command: translate characters % cat file | tr "A-Z" "a-z" Convert all uppercase letters to lowercase. % cat file | tr -dc "a-z \n" Delete all characters other than spaces, line feeds and lowercase alphabetic characters. The 'uniq' command: count or remove consecutive duplicate lines % sort file | uniq -c Count the duplicate lines appearing in a file.