Friday, July 5, 2002

Talking to Computers

I'm finding my self-imposed discipline of writing a journal entry and then not returning to edit or substantially add to it a bit frustrating but I'll try to resist the temptation to return to a previous day's entry. Last night my head was full of things I want to say today but there's no way that I can write it all down in a single day. I resolved therefore to just have at it and the best way, I thought, to start a book about programming and computing is to do some programming.

Hardly a day goes by when I don't write at least one line of code. I think of it as simply talking with my computer. This morning for instance I wondered how much I had written during the past two days in my first two journal entries. I have a laptop computer on the breakfast table which is connected to a wide area network (think of it as the World Wide Web or the Internet if you're familiar with those terms) through a local area (my house) wireless network and a broadband connection supported by a cable company. I leave the laptop on the breakfast table so I can read the on-line version of the New York Times in the morning or check the weather while I eat breakfast.

I use a program on my laptop (it's called ssh for "secure shell") to tunnel through the firewall protecting the computers in the CS department at Brown (I love these metaphors: tunneling and firewalls) and open a shell (a special program that allows me to interact more or less directly with the operating system - a variant of Unix in this case) on the machine that sits in my office, whose name by the way is "klee" for the artist, Paul Klee, (check out Klee's "Twittering Machine" for some idea of why I named my computer after him) and whose symbolic address on the internet is "klee.cs.brown.edu" I could have opened a shell on any of several hundred machines that reside within the firewall but I chose to do it on my "own" machine rather than slow down ("steal cycles") from a machine being used by someone else. (Think about how I could run a program on a machine that someone else is using and how you might interpret the phrase "stealing cycles" in the context of slowing down a computer.)

The program "ssh" allows me to work remotely on computers that "trust me" in such a way that the information that is sent back and forth between my laptop and klee can't be deciphered by someone with access to the wires on which the information is transmitted and doesn't allow a malicious hacker to break into either my laptop or klee.

I'll get back to why I opened a shell on klee in a moment, but I wanted to take a moment to reflect on how the internet has blurred the distinction between individual computers. I'm almost always connected to the internet but I don't really think about what computer I'm talking with. When I'm in the department at Brown and not in my office I walk around with my laptop connected to the department wireless network which connects to a wide area network and then to the internet. Right this minute I'm working at my laptop, typing into a shell that's running on the computer in my home office a few feet away but in another window I'm connected to klee. And for all I know the data that's flowing between these computers could be circling the globe, zipping through cables under the ocean and bouncing off satellites along the way. Indeed, I could pretty easily force the data to go through Zurich, Seattle or Tokyo. In a moment, I'll go into my home office where I have a larger screen and a more comfortable (actually it's less comfortable than this couch, but it's supposed to be ergonomically designed so that I won't be in pain after a day of typing at a keyboard and looking at a screen) but I'll continue to keep programs running in several places.

Given the current state of the art, I do have to think a little bit about where I am or rather where the program is that's currently interpreting my key strokes. The reason I have to be cognizant of which computer I'm working on is that different machines have different software and offer different services. I'm a little more confident that I won't lose data that's stored on the machines in the department because I trust the folks that maintain those machines and perform the backups on the filesystem there. I have to do the backups on my laptop and the machine in my home office and I know I'm not very careful in doing this.

Eventually, with the exception of some very specialized programs and services, I won't have to worry to worry about what computers are running the programs that I have to work with. This is already true to a certain extent if you confine your computing to what you can do within a web browser and, yes, you can do a lot of useful computing within a web browser. Some folks, I've noticed, don't even distinguish between their web browser and their computer; they do everything, email, news, shopping, entertainment and education, from within their web browser. For the last twenty years, I've been using programs like "telnet" and "ftp" to work on computers that are thousands of miles away from where I'm sitting. In the early 90s, it seemed like a miracle to be sitting in a hotel room in Paris and making the computer in my office in Providence dance to my bidding or directing a computer at Stanford University to transfer files to the portable computer on my bed in the hotel room. Now most of you "netizens" take this for granted and though you don't know the magic incantations that direct these processes, you can command their power with the click of a mouse.

That was a long diversion and I want to get back to explaining why I opened a shell on klee. The best way to learn to program and thereby gain even greater control over computers and coax them to perform even more astounding feats of magic is just to start programming. So in the remainder of today's journal entry, I want to give you some examples of everyday programming, not fancy stuff, just examples of talking with computers and getting them to do useful and not-so-useful but fun stuff. I'm going to say this but I don't expect you to manage it right off the bat; in the following, try not to sweat the details; as a computer scientist looking at programs involving multiple programming languages written my multiple programmers you just have to learn to "squint" at the syntax and follow your intuitions. More about squinting later. Here's a snippet of my exchange with klee:

/u/tld/email/book % wc -l ./journal/2002/07/*/*.txt 
     161 ./journal/2002/07/03/day.txt 
     281 ./journal/2002/07/04/day.txt 
     442 total

Pretty inscrutable to the uninitiated and so the above will require a bit of explanation. The /u/tld/email/book % part was printed by the shell. It's referred to as "prompt" and when I'm in the shell window (the portion of my computer screen dedicate to the shell) the cursor will be positioned at the end of the prompt waiting for me to type something. (This won't be clear to someone who doesn't have much experience with a computer and no amount of prose is likely to make it much clearer. This really cries out for a little multi-media razzle dazzle, a little digital movie showing me or someone typing instructions and seeing results printed to a shell window.) I'll give a half-hearted attempt to explain the /u/tld/email/book part in just a minute. In the little snippet above, I typed the wc -l ./journal/2002/07/*/*.txt followed by a carriage return, i.e., I hit the key marked "return" on my keyboard. (Any idea what the "carriage" in "carriage return" refers to?) With no perceptible pause, the shell printed out the following three lines, which you can think of as the answer to my question or the result of the computation.

The wc -l ./journal/2002/07/*/*.txt can be thought of as command to perform a specific operation, a short program to execute, or my side of a conversation with the shell and so indirectly with the operating system running on my computer. A computer operating system is just another program, really a collection of many programs written (and rewritten) by many different people over a long span of time, often decades or more. You can think of it as the accumulated wisdom of a host of very clever programmers who packed the operating system with everything they felt was fundamentally useful to and commonly needed by a wide range of more specific programs.

Additional programs such as applications like web browsers and word processors are run "on top of" or under the control of the operating system. The operating system sees all and controls all; it's only through the operating system that programs I write can get information from the outside world through a local network or the World Wide Web or send files to printers or grab data stored on disks or CDs. If this seems mysterious to you, don't worry; it really is complicated. The good news is that for the most part you don't have to understand the details since the operating system hides a lot of the complexity of computer hardware and the world beyond your computer from the programmer. This ability of programs to hide complexity is essential to the development of large complicated programs.

The specific command I invoked wc -l ./journal/2002/07/*/*.txt directed the shell to run the program wc (for wordcount) to count lines -l and not words in the files specified by the pattern ./journal/2002/07/*/*.txt where * is a "wildcard" that matches any string of characters. It turns out that this pattern matched two files corresponding to the the journal entries for July 3 and July 4. (In the case of July 3, the first * matched 03 and the second day, and in the case of July 4, the first * matched 04 and the second day.)

A few words about filesystems and the strange strings of characters containing slashes (/s) is probably in order but if you've never been exposed to files or directories then this is going to be incomprehensible. A / with no preceding text indicates the "root" directory; as far we're concerned, everything is stored under the root of the filesystem. The u in /u/ is a symbolic link to the /users/ directory on the Brown filesystem where the directories and files of computer "users" (such as myself) are stored. /u/tld/ designates my home directory, where all my stuff is stored. My login name is tld for the initials in my name, Thomas Linus Dean.

As an aside, I'm fond of the "tld" handle for the following arcane reason: ~ is Unix speak for a user's home directory. If I cd or connect to ~ then I'm connecting to my home directory /u/tld/, i.e., I am causing the operating system to interpret commands and access files (at least those specified by relative pathnames) in my home directory. I like to think of tld as the shortened form of "tilde" (remove the vowels) which is how you spell the name of the symbol "~". Someone else on our system wanting to connect to my directory would connect to ~tld assuming that I allow such connections by setting my permissions appropriately, which I do. If you don't know anything about filesystems then this is probably all impenetrable to you; we'll have more about filesystems, directory structures and permissions later. By the way, any idea why I referred to tld as a "handle"? It makes sense for more than one reason.) Even on my home computer where I'm the only registered user, all of my working files are stored in ~. Now back to main thread.

The directory /u/tld/email/ is where I generally store files and keep directories related to my daily activities; this naming and use convention wasn't carefully planned out, it just evolved and every once and a while I try to rationalize it but I typically revert to habit. /u/tld/email/book/ is the temporary directory that I created to keep files related to working on this book. On Wednesday when I had my epiphany and decided to create a journal-based book I created the subdirectories /u/tld/email/book/journal/, /u/tld/email/book/journal/2002/, /u/tld/email/book/journal/2002/07/, and /u/tld/email/book/journal/2002/07/03/, and the file /u/tld/email/book/journal/2002/07/03/day.txt. Then yesterday I created /u/tld/email/book/journal/2002/07/04/, and the file /u/tld/email/book/journal/2002/07/04/day.txt. All of this was very much on the spur of the moment and eventually I'll have to think this out more clearly if I'm to keep things organized.

Again, that was a long diversion to explain a simple program but it does take some context to understand what's going on and I haven't yet had the time to establish much of a context. It's important to note however that wc -l ./journal/2002/07/*/*.txt really is a program of sorts, albeit a short and rather cryptic one. The idea that this short program called another program wc should come as no surprise. Most programming languages provide access to all sorts of specialized programs. Even + in a language that allows 1 + 2 is a program and not a simple one as it will turn out.

Shells and other methods of interacting with operating systems (sometimes called "command line interfaces") generally offer a wide range of powerful programs that can be orchestrated to perform tasks. For example, (I switched back to the window connected to klee, typed a few lines and then cut and pasted the exchange back into this window) the following program (called a shell script) renames files with the extension html in the directory in which it is executed to have the extension htm. In the following, /u/tld/ % is the prompt; I moved to another directory. I've used the program ls first to list all files with either extension before executing the program to rename the files and then again after executing the program to show you that it made the changes.

/u/tld/ % ls *.html *.htm
home.html      syllabus.html 
/u/tld/ % ls *.html | sed -e "s/html//g" | awk '{print "mv " $1 "html " $1 "htm"}' | sh 
/u/tld/ % ls *.htm  *.htm
home.htm      syllabus.htm

It would take a while to explain this program in detail but let me give you a quick high-level overview. The program starts by selecting the set of files that have the extension html. The |s create what are called "pipes" in the Unix world and they convert the output of one program, ls *.html in this case, into the input to another program, The sed -e "s/html//g"" part of the program takes each file name in turn and rips off the html part. The output of sed -e "s/html//g" is two truncated file names home. and syllabus. and, due to the next |, these names are piped into the program fragment awk '{print "mv " $1 "html " $1 "htm"}' that essentially writes two little programs which are themselves shell scripts and look like mv home.html home.htm and mv syllabus.html syllabus.htm (mv is the "move" or "rename" command and requires you to specify both the original and the new names of the file that you wish to rename). The output of awk '{print "mv " $1 "html " $1 "htm"}' is fed into, via the last |, the program sh which is indeed another shell (remember we're typing to one shell).

If the above program isn't a little mind boggling then you weren't paying attention or you don't have a mind left to boggle. The little program above actually wrote a couple of little programs, started up a shell and then submitted those programs to the new shell to run resulting in the desired outcome. As we'll see later on, programs that write and run other programs and even replicate or improve upon themselves, e.g., computer viruses, are relatively common.

Think about it! I call up a shell - a software demon to do my bidding - and I write a short program - an arcane, inscrutable little incantation small enough to fit on one line - which itself is capable of writing programs and calling up shells. When I submit my program to the shell, I conjure up a spirit in the form of my program running some existing programs, writing its own programs, invoking its own demon shell and conjuring up its own spirits. Have you ever seen Disney's "Fantasia" and, in particular, the animated accompaniment to Paul Dukas's" The Sorcerer's Apprentice"? Mickey and his multiplying broomsticks and dancing water pails can't hold a candle to the magic I can conjure from the little laptop sitting on my knees as I type these words.

The above program with a few simple modifications could change the names of thousands of files stored in any number of directories and any number of computers. With just a little more effort, I could write a program that would go inside each of these files and change any reference in the text to a file with extension html to have the extension htm. If you were maintaining a web presence with thousands of web pages, you might find yourself writing and running similar programs frequently.

Speaking of writing programs to modify lots of files and working on web pages, many web masters use a language called Perl both to help maintain web pages and to perform computations for visitors to the web pages that they maintain. Suppose that you have a bunch of text files that refer to dates in the format mm/dd/yy and you want to change them so that the months are referenced explicitly, perhaps to avoid confusion with the alternative format dd/mm/yy. You might also want to try to clear up the ambiguity of what century is implied when only two digits are used to indicate the year. I just created a short little text file of the sort I'd like to be able to modify and I'll get the shell to print it out (cat is a Unix command for creating, displaying and concatenating small files).

/u/tld/email % cat dates.txt
The date 1/1/00 should be changed 
as should 12/31/1999 and 1/1/2002, 
but not the file /usr/local/bin/.

I've written a little Perl program in a file call program.pl that will perform the conversion and again I get the shell to print it out.

/u/tld/email % cat program.pl
@month = (Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec); 
while ($_ = <ARGV>) { 
    s|\b(\d\d?)/(\d\d?)/(?:19)?(\d\d)\b|$month[$1-1] $2, 19$3|; 
    s|\b(\d\d?)/(\d\d?)/(?:20)(\d\d)\b|$month[$1-1] $2, 20$3|; 
    print "$_"; 
}

Finally I invoke the program by calling Perl to interpret my program and tell the shell to store the output of the program, the result of executing the print statement in program.pl multiple times, in a file called dates.new.txt.

/u/tld/email % perl program.pl dates.txt > dates.new.txt

To show that it worked I tell the shell to print out dates.new.txt.

/u/tld/email % cat dates.new.txt
The date Jan 1, 1900 should be changed 
as should Dec 31, 1999 and Jan 1, 2002, 
but not the file /usr/local/bin/.

Not too exciting and I'll bet that program.pl will resolve the year ambiguity incorrectly as often as it gets it right, but you get the general idea. It's not important that you understand the exact syntax of the Perl program except to note that it uses a loop (the while statement) to read each line of the file, makes substitutions where appropriate and then prints out the line with the substitutions if made.

Often enough when you fill out a form or submit a query on the web you'll cause a Perl program to run on the machine that "hosts" the web site and the results of that program will determine the information that is returned to you and displayed in your web browser.

The Perl program was long enough that I couldn't type it out on a single line, or at least not conveniently, and so I put it in file and submitted the file to the Perl interpreter. The Perl interpreter is a computer program that understands the Perl programming language and interpreting Perl programs can be pretty complex in general. You may have heard that computer programs have to be "compiled" into some other form before they can be "executed" and, while this is true, it's largely beside the point. The exact manner in which a piece of syntax such as while ($_ = <ARGV>) { ... } gets converted into a form that can be carried out by the primitive hardware of a particular machine is very complicated but you don't really need to know this in order to be an effective programmer. It might help in some cases but it can also hinder in others where, for example, you use your knowledge of a particular machine thereby rendering your code unusable on any other machine.

I like programming environments (basically programs that support development in a given programming language) that allow me to interact easily with my programs even as I develop them. Some programmers associate this sort of interaction with the so-called interpreted languages such as Lisp and Prolog that I mentioned yesterday but this is misleading. I could easily create a programming environment for Lisp that requires every bit of code be compiled into a form immediately executable on a given machine (a "binary" as they like to say). Conversely, I could create an interactive environment with an interpreter for a language such as C that traditionally supports only the compile-first-then-debug model of interaction.

There is a dialect of C (or perhaps more accurately, a language with C-like syntax) called, not surprisingly, interactive C that can be used to control robots via a tiny single-board computer called the Handy Board. With a serial line connecting my computer to the Handy Board, I can type commands at my terminal window that are sent to and then interpreted by a program running on the Handy Board. As long as the serial line is connected I can command motors to turn, sensors to report their values and perform additional calculations to control the robot. There are similar languages and interpreters for the popular Lego Mindstorms line of programmable robots. Here are some other examples of interactive environments that I use frequently.

I use Mathematica for all sorts of programming that involve complex mathematics. Most of the time I use the fancy graphical frontend that allows me to create two- and three-dimensional graphics. It's great for visualizing functions and analyzing data. But I also occasionally call up Mathematica in shell and use it as a fancy calculator. Mathematica can solve algebraic equations much better than I can with paper and pencil and it can do symbolic differentiation and integration in a snap. It doesn't replace a good mathematics education so much as augments it.

/u/tld/email % mathematica
Copyright 1988-2001 Wolfram Research, Inc.
 -- Terminal graphics initialized -- 

In[1]:= Solve[ x² - 4 == 0, x ]

Out[1]= {{x -> -2}, {x -> 2}}

In[2]:= Solve[ x² == 7 - 2 x, x]

Out[2]= {{x -> -1 - 2 Sqrt[2]}, {x -> -1 + 2 Sqrt[2]}}

In[3]:= D[ x³, x ]

Out[3]= 3 x²

In[4]:= Integrate[ArcTan[x], x ]

                      Log[1 + x² ]
Out[4]= x ArcTan[x] - -----------
                           2

The above doesn't begin to show off what Mathematica is capable and the standard graphical user interface that most people use to interact with Mathematica is amazingly powerful and useful. I don't want to leave you with the impression that Mathematica is the only product that provides this sort of capability, there are others like Maple and Matlab that provide similar functionality. The neat thing about such programs is that some very smart programmers and mathematicians have developed powerful tools that allow us to take advantage of their knowledge and skill in mathematics. The programs they have written embody their expertise in a form that I can take advantage of. I'm going to cheat how and borrow three paragraphs from a short article that I wrote a couple of months ago for the Brown University Faculty Bulletin:

Many of us in computer science today found our first encounter with a computer a profound, life-altering experience. Typically we sat at a terminal of some sort, typing statements; the computer interpreted our statements as commands and carried out those commands, printing an occasionally scrutable but more often cryptic and incomprehensible response.

With time, we learned to decipher those cryptic responses and even program the computer to carry out other commands with incomprehensible responses of our own design. We learned that interaction with a computer could be subtle, applicable to a wide range of tasks, and infinitely extensible. We quickly realized that by using loops, conditionals, and Boolean logic you could persuade computers to carry out complex tasks without intervention; you could embed your knowledge of the world in programs that would repeat routine tasks as many times as you like and do it infinitely faster and (if you're careful in writing the program) without error.

The speed and precision of computers let you cheat time and avoid (some forms of) fatigue and boredom. Not only could you embed your knowledge in a program, but you could share that knowledge with someone else in the same way that you could tell someone how to repair a bicycle or prepare a recipe. You could also use the knowledge that others embedded in their programs by combining it with your programs to solve new problems you couldn't handle otherwise.

I am continually amazed with the quality and quantity of software that is available to anyone with a computer and connection to the internet. Computer science is unlike other disciplines in that many of its most useful and exciting products are available for free to anyone willing to learn to use them. Not only are the programs available free but so is the documentation and even access to expert advice in the form of on-line lists of frequently asked questions and forums where programmers exchange ideas. The open-source movement is creating software products of great value and quality. But we're getting ahead of ourselves; here are some more interactive programs.

I mentioned Prolog yesterday so I installed a version and I'll give it a quick spin. In the following, terms that begin with lower case letters are constants and those that begin with upper case letters are variables. And if you don't know the difference between constants and variables think back to when you learned algebra where constants were typically numbers and variables were often denoted by letters like x or y. In the following interaction with Prolog, we create a Prolog database by first asserting three simple facts: fred is one of anne's parents, anne is one of lucy's parents and lucy is one of bill's parents. Next we specify two general rules indicating that your parents are your ancestors as are their parents. Then we query the database to see if Prolog gets it right.

In the following exchange, the part corresponding to the assertion of the three facts and two rules should be clear - I typed the strings beginning with assert and ending with a period and Prolog responded to each assertion with a "yes". The line ancestor(X, Y). corresponds to a query requesting Prolog to find assignments to the variables X and Y that would instantiate the expression ancestor(X, Y) as a true statement. In response to my typing this line, Prolog prints out an assignment to the variables, e.g., X = fred and Y = anne, followed by a ? asking if this assignment is the one I was looking for. In response to each ?, I typed no thereby forcing Prolog to try to find another assignment.

 
/u/tld % prolog 
| ?- assert( parent(fred, anne) ). 
yes 
| ?- assert( parent(anne, lucy) ). 
yes 
| ?- assert( parent(lucy, bill) ). 
yes 
| ?- assert( (ancestor(X, Y) :- parent(X, Z)) ).
yes 
| ?- assert( (ancestor(X, Y) :- parent(X, Z), parent(Z, Y)) ). 
yes 
| ?- ancestor(X, Y). 
X = fred, 
Y = anne ? no
X = anne, 
Y = lucy ? no
X = fred, 
Y = lucy  
| ?-

Almost but our second rule wasn't as general as it could have been. Note that we expect fred to be one of bill's ancestors but our rule doesn't cover this case.

| ?- ancestor(fred, bill). 
no

We could add rule of the form

ancestor(X, Y) :- parent(X, Z), parent(Z, W), parent(W, Y).

but that would only be a stop-gap measure. What's needed is a rule that captures the fact that X is the ancestor of Y if Z is the parent of X and Z is the ancestor of Y. This is a recursive definition which means that in defining what it means for someone to be the ancestor of someone else we refer to the very rule we're defining. When we add this rule we get the answer we're expecting.

| ?- assert( (ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y)) ). 
yes
| ?- ancestor(fred, bill). 
yes
| ?-

Prolog is handy for specifying relationships between people and things especially when it makes sense to define those relationships implicitly through recursive rules such as we did above. It would be real pain if you had to write down all the facts of the form ancestor(fred, lucy), ancestor(anne, lucy), etc. Prolog is great for writing programs that depend on complex rules or policies. If I was writing a program that computed medical benefits for employees or depended on the tax laws I'd definitely think about writing it in Prolog.

And last (for today anyway) but certainly not least in my pantheon of useful programming languages there's Lisp and its dialects. It so happens that large parts of the editor in which I'm composing this journal entry (GNU Emacs) are written in Lisp and I can easily define Lisp functions and assign them to key sequences so that when I type those sequences the assigned function will be called. You can't see any evidence of it as you read this paragraph but as I ended the last sentence I typed Meta-q (that is the Escape key following by lower-case letter Q) which called my special paragraph formating function which adjusted the way this paragraph appeared in my editor window. I've got lots of functions defined to do special tasks while, for example, I'm writing and debugging programs or responding to email all of which I do in Emacs.

My favorite dialect of Lisp is called Scheme and it's the dialect that is used in Abelson and Sussman's "Structure and Interpretation of Computer Programs". I just opened a shell on klee and looked around a bit for implementations of Lisp. I found a couple of versions of Common Lisp and more than half a dozen implementations of Scheme. I'll fire one up one called SCM that has the copyright of the Free Software Foundation.

/u/tld/email % scm 
> (define (factorial n) (if (= n 1) 1 (* n (factorial (- n 1))))) 
> (factorial 3) 
(factorial 3) 
6 
> (factorial 4) 
24 
> (factorial 5) 
(factorial 5) 
120
> (factorial 6) 
720

Sorry for choosing such a nerdy academic example but I'm getting tired and I'm not feeling very creative. The factorial function, often notated n! and spoken "n factorial", takes a positive integer as its only argument and returns the product of numbers 1, 2, 3, ... up to n, so that 4! = 1 * 2 * 3 * 4 = 24. Note that I've defined factorial recursively. There are all sorts of peculiarities about Lisp (and Scheme) though by now what appears odd to you is likely to seem quite natural to me. For example, mathematical expressions in Lisp are written in polish notation with the operator appearing first so that instead of writing (3 + (3 * 3)), where in this case the parentheses are optional (but what about (3 * (3 + 3))), you would write (+ 3 (* 3 3)) and the parentheses are not optional (by the way, some calculators and HP (Hewlett Packard) calculators in particular use reverse polish notation in which the operator appears last).

We'll talk more about Scheme in subsequent journal entries but one thing I want to point out. You don't have to write code to compute primitive math functions to get into computing. Computers can amplify your abilities. They make it possible for you to make use of the knowledge and skill of mathematicians, linguists, cartographers, and scholars, engineers and scientists of every stripe. My tired brain is groping for some inspirational fodder so I'll borrow again from that somewhat stilted Faculty Bulletin piece that I quoted from earlier.

The way modern software is produced is influencing how people work in multidisciplinary studies where computers play a central role. In software production houses, programmers work in teams with one or a few persons responsible for each component of a large piece of software. Seldom nowadays is a solitary programmer responsible for a large software project. Indeed, some of the contributors to such a project are often not programmers at all but rather computer-savvy experts in disciplines from human factors and psychology to business and law. Key to a successful product are clear communication among the team members and the ability to decompose large, complex problems into tractable components encapsulating the different sources of expertise. Such projects force us to define appropriate vocabularies at the right level of abstraction to support discourse at the boundaries between components and their encapsulated expertise. The technical and organizational skills that produce successful software projects are also crucial to multidisciplinary efforts, especially those in which computers are used to integrate different sources of knowledge.

Computational molecular biology, combining among other fields molecular biology, organic chemistry and computer science, is a good example of a discipline that lends itself to collaborative research with computers and computer scientists playing myriad supporting roles. Computational pharmacology, sequencing and related genomic studies, evolutionary molecular biology, and protein-folding analyses depend on scientists from several fields working together, often physically in the same space but sometimes only virtually, using computers to facilitate almost every aspect of their communication, data gathering and analysis. Similar revolutions are afoot in economics, brain science, and a host of other disciplines, and we're just beginning to understand the benefits of visualizing large data sets from financial, health care and government census sources.

Heady stuff and it just gets more interesting every year. I wanted to show what it's like to interact with an industrial strength database but I think I've run out of steam for today. Next time we'll get back to basics and talk about what, if any, models or abstract methods of thinking good programmers have in their heads.

In the spirit of acknowledging feedback as we go, I want to thank Erika, Kate, Leah, Susannah and Jennifer for several fascinating discussions that are still reverberating in my head and have already influenced and will continue to influence what I write in this journal. And thanks to John Bazik one of the wizards on our technical staff who is always patient with my questions and invariably gives me something interesting to think about.