Lecture 20: Pipes, Multiprocessing, Threads #
Inter-Process Communication #
We want processes to be isolated ā that is why we introduced virtual memory after all! But sometimes, they need to convey information to each other. For example, you may want to accelerate a large task by dividing the work between multiple processes (which can run in parallel on multiple processors in your computer) and have the child processes send the results back to the parent process. With the super-strict isolation that we've created, this is not possible!
Fortunately, there are a range of mechanisms for processes to communicate with each other in safe ways, typically mediated by the kernel. These ways of communicating, and the abstractions that implement them are called inter-process communication.
Waiting for a process to exit #
The simplest form of IPC is the exit status of a process, which gives it a one-off opportunity to send a single integer to the parent process. Processes on Linux often use this integer to indicate whether they exited successfully (a zero exit status) or whether an error occurred (a non-zero exit status, often a negative one).
But how does the parent process get access to the return code of a
child? This is where the wait
family of system calls comes in. We will
specifically look at one variant, the waitpid()
system call.
waitpid()
serves two purposes. First, it allows a parent process to
wait for a child process to exit. This is useful, for example, when a
shell starts a process and wants to wait for it to exit before printing
the prompt again. You may have observed that the output from
./forkmyecho
is often mixed with the shell prompt ā this indicates
that one of the processes involved does not wait correctly! Second,
waitpid()
allows the parent process to read the child's exit status.
Let's look at the example in waitdemo.cc
. This program does the
following:
- It creates a child process.
- The child process sleeps for half a second, prints out a message, and exits.
- The parent waits for the child to finish, and prints out a message based on the child's exit status.
int main() {
fprintf(stderr, "Hello from parent pid %d\n", getpid());
// Start a child
pid_t p1 = fork();
assert(p1 >= 0);
if (p1 == 0) {
usleep(500000);
fprintf(stderr, "Goodbye from child pid %d\n", getpid());
exit(0);
}
double start_time = tstamp();
// Wait for the child and print its status
int status;
pid_t exited_pid = waitpid(p1, &status, 0);
assert(exited_pid == p1);
if (WIFEXITED(status)) {
fprintf(stderr, "Child exited with status %d after %g sec\n",
WEXITSTATUS(status), tstamp() - start_time);
} else {
fprintf(stderr, "Child exited abnormally [%x]\n", status);
}
}
The interesting line in the program is the call to waitpid()
in the
parent. waitpid()
takes as its first argument the PID of the process
to wait for. This must be a direct child of the current process! A
process cannot wait for a PID that isn't its child ā waitpid()
will
return an error if you try. The second argument is a pointer to an
integer in which the kernel will deposit the child's exit status.
Note the last argument to waitpid()
, 0, which tells the system call to
block until the child exits. This tells the kernel to make the parent
runnable again only once the child has exited.
Blocking vs. polling
Blocking, as opposed to polling, can be a more efficient way to programmatically "wait for things to happen". It is a paradigm we will see over again in the course; and it relies on cooperation between user-space and the kernel. Specifically, many system calls will "block" until a specific condition is true, and it's the kernel's job to refrain from scheduling a process that is blocked.
The effect of the waitpid()
system call is that the parent will not
print out the "Child exited..." message until after the child exits. The
two processes are effectively synchronized in this way.
Zombie processes #
When a process forks a child, the child eventually exits and will have a
exit status. That exit status needs to be stored somewhere in memory,
but the child process's address space has already been destroyed! The
responsibility of tracking the exit status falls to the kernel, and
means that it needs to keep information about the exited child around
until the parent calls waitpid()
(or itself exits).
Consequently, the child process then enters a "zombie state" after
exiting: the process no longer exits, but the kernel still keeps around
its PID and its exit status, waiting for waitpid()
to be called on the
process. Zombie processes consume kernel resources and we should avoid
having zombies lying around whenever possible!
The trick for avoiding zombie processes is to call waitpid()
at least
once for each child. Invoking waitpid()
with -1
as the pid
argument will check on an exit status of an arbitrary child.
Limitations of IPC via exit status #
Exit detection communicates very little information between processes. It essentially only communicates the exit status of the program exiting. Moreover, the fact that it can only deliver the information after one program has already exited further restricts the types of actions the listening process can take after hearing from the child. Clearly, we would like a richer communication mechanism between processes. Ideally, we would like some sort of channel between two processes that allows them to exchange arbitrary data while they're still running!
Pipes: More Powerful IPC #
Processes, though operating in isolated virtual address spaces, often need to communicate with each other in controlled ways. This is what Inter-Process Communication (IPC) is about. We already saw simple examples of IPC in the form of process exit statuses last time. In addition, Linux provides other mechanisms for processes to communicate: they can read and write files on disk, or they can rely on a more efficient, in-memory streaming transport in the form of pipes.
The notion of multiple processes working together to achieve a goal shows up in many ways in your everday computing. Two key examples are:
- Combining multiple shell commands via the
|
("pipe") character: you may have seen chains of commands such aswc -l *.cc | sort -n
, which counts the lines in all C++ source files in the current directory (wc -l *.cc
) and then sorts the result by the number of lines using another command (sort -n
). The pipe symbol here tells the shell to set up the processes with a pipe between them and for the first command to send its output to the pipe, from where the second command reads it. - Speeding up operations that interact with a large data set or many requests by splitting the work across processes that communicate results back to a parent process. This works especially well if you're using a computer with multiple processors, where in the best case N processes achieve NĆ the speed or throughput of a single process.
Pipe Setup #
A pipe provides a unidirectional, in-order transport mechanism. A
process can use a pipe to send data to itself, but a pipe is most
powerful when combined with fork()
, which allows it to be used for
communication across parent and child processes.
Pipes are created using the pipe()
system call. Each pipe has two
user-facing file descriptors, corresponding to the read end and the
write end of the pipe. File descriptors are identifiers that the
kernel uses to keep track of open resources (such as files) used by
user-space processes. User-space processes refer to these resources
using integer file descriptor (FD) numbers; in the kernel, the FD
numbers index into a FD table maintained for each process.
The signature of the pipe()
system call looks like this:
int pipe(int pfd[2]);
It is the responsibility of the user-space process to allocate memory
for the pfd
array. The memory can be in a global variable, on the
stack, or on the heap. This follows a general principle with Linux
system calls: the memory that the kernel writes data or information to
must be allocated by user-space and a valid pointer passed into the
kernel.
A successful call to pipe()
creates 2 file descriptors, placed in
array pfd
:
pfd[0]
: read end of the pipepfd[1]
: write end of the pipe
How do I remember which end is which?
The default file descriptors available to each process (
stdin
,stdout
, andstderr
) provide a useful mnemonic to remember which end of a pipe is the read end:
- 0 is the FD number for
stdin
, 1 is the FD number ofstdout
- Programs read from stdin and write to stdout
pfd[0]
is the read end (input end),pfd[1]
is the write end (output end)
Data written to pfd[1]
can be read from pfd[0]
.
Can a pipe be used bi-directionally?
The read end of the pipe can't be written, and the write end of the pipe can't be read. Attempting to read/write to the wrong end of the pipe will result in a system call error (the
read()
orwrite()
call will return -1).
Let's look at a concrete example in selfpipe.cc
:
int main() {
int pfd[2];
int r = pipe(pfd);
assert(r == 0);
char wbuf[BUFSIZ];
sprintf(wbuf, "Hello from pid %d\n", getpid());
ssize_t n = write(pfd[1], wbuf, strlen(wbuf));
assert(n == (ssize_t) strlen(wbuf));
char rbuf[BUFSIZ];
n = read(pfd[0], rbuf, BUFSIZ);
assert(n >= 0);
rbuf[n] = 0;
assert(strcmp(wbuf, rbuf) == 0);
printf("Wrote %s", wbuf);
printf("Read %s", rbuf);
}
This code (which doesn't contain fork()
) creates a pipe, writes to it,
and reads from it within a single process. We create a pipe, write to
the pipe, and then read from the pipe. Finally, we assert that the
string we get out of the pipe is the same string we wrote into the pipe.
You might wonder where the data goes after the write()
system call
completes, but before the read()
from the pipe is invoked. The data
doesn't live in the process's address space! It actually goes into a
memory buffer located in the kernel address space.
The read()
system call blocks when reading from a stream file
descriptor that doesn't have any data to be read. Pipe file descriptors
are stream file descriptors, so reading from an empty pipe will block
(meaning the kernel won't schedule the reading process until there is
data available in the pipe). write()
calls to a pipe block when the
buffer is full (because reader the not consuming quickly enough). A
read()
from a pipe returns EOF
if all write ends of a pipe is
closed.
So far we've only seen pipe functioning within the same process. Since
the pipe lives in the kernel, it can also be used to pass data between
processes. We can combine the fork()
system call (which copies the
parent process's open file descriptors to the child process) with pipes
to establish a communication channel across processes.
Pipes across processes #
Since the pipe lives in the kernel, it can also be used to pass data
between processes. Let's take a look at childpipe.cc
as an example:
int main() {
int pipefd[2];
int r = pipe(pipefd);
assert(r == 0);
pid_t p1 = fork();
assert(p1 >= 0);
if (p1 == 0) {
// child process
const char* message = "Hello, mama!";
printf("[child: %d] sending message\n", getpid());
ssize_t nw = write(pipefd[1], message, strlen(message));
assert(nw == (ssize_t) strlen(message));
exit(0);
}
char buf[BUFSIZ];
if (read(pipefd[0], buf, BUFSIZ) > 0) {
printf("[parent: %d] I got a message! It was ā%sā\n", getpid(), buf);
}
close(pipefd[0]);
close(pipefd[1]);
}
Here, we use fork()
to create a child process, but note that before
forking we created a pipe first! The fork()
duplicates the two pipe
file descriptors in the child, but note that the pipe itself is not
duplicated (because the pipe doesn't live in the process's address
space). The child then writes a message to the pipe, and the same
message can be read from the parent. Interprocess communication!
Note that in the scenario above we have four file descriptors associated
with the pipe, because fork()
duplicates the file descriptors
corresponding to two ends of a pipe. The pipe in this case has two read
ends and two write ends. The animated picture below (courtesy of Eddie
Kohler) illustrates the situation in terms of the FD table maintained
for both processes:
Note that there continues to exist a write end of the pipe in the parent, and a read end in the child that never get closed! This can lead to weird hangs when using pipes, so a common idiom is to close the pipe ends that a process does not need immediately after forking. In the example, we need to close the write end in the parent and the read end in the child:
...
pid_t p1 = fork();
assert(p1 >= 0);
if (p1 == 0) {
... // child code
}
close(pipefd[1]); // close the write end in the parent
char buf[BUFSIZ];
if (read(pipefd[0], buf, BUFSIZ) > 0) {
...
Are pipes always 1:1 between a parent and a child process?
Pipes are quite powerful! You can also create a pipe between two child processes (think about how you would use
fork()
,pipe()
, andclose()
to achieve this!), and a pipe can actually have multiple active read ends and write ends. A use case for a pipe with multiple active write ends is a multiple-producer, single-consumer (MPSC) setup ā this is useful, e.g., when a process listens for messages from a set of other processes. Likewise, a pipe with multiple active read ends supports a single-producer, multiple consumer (SPMC) use case, where multiple readers compete for messages (e.g., work units) sent by a writer. Multiple producer, multiple consumer (MPMC) is also possible, but rarely used in practice because it becomes difficult to coordinate the readers and writers.
Summary #
Today, we started breaking down the absolute isolation of processes by
introducing mechanisms for them to communicate via inter-process
communication abstractions. One simple, one-shot communication mechanism
is the exit status that a parent process can read when it uses
waitpid()
after it waited for a child process to exit. Because that
exit status needs to be available even after the child process has
exited, the OS kernel will keep processes around as "zombie processes"
until the parent has called waitpid()
on them and retrieved (or
ignored) the exit status.
We then dove deeper into how processes can interact via pipes, which are kernel-mediated shared-memory buffers that processes access via file descriptors. We saw that combining pipes with forking allows processes to establish communication channels with (and even between) their children. This is a very powerful abstraction that allows us to chain shell commands, and to achieve parallelism using multiple processes on multi-processor computers.