Lecture 20: Pipes, Multiprocessing, Threads

šŸŽ„ Lecture video (Brown ID required)
šŸ’» Lecture code
ā“ Post-Lecture Quiz (due 11:59pm, Monday, April 15).

Inter-Process Communication

We want processes to be isolated – that is why we introduced virtual memory after all! But sometimes, they need to convey information to each other. For example, you may want to accelerate a large task by dividing the work between multiple processes (which can run in parallel on multiple processors in your computer) and have the child processes send the results back to the parent process. With the super-strict isolation that we've created, this is not possible!

Fortunately, there are a range of mechanisms for processes to communicate with each other in safe ways, typically mediated by the kernel. These ways of communicating, and the abstractions that implement them are called inter-process communication.

Waiting for a process to exit

The simplest form of IPC is the exit status of a process, which gives it a one-off opportunity to send a single integer to the parent process. Processes on Linux often use this integer to indicate whether they exited successfully (a zero exit status) or whether an error occurred (a non-zero exit status, often a negative one).

But how does the parent process get access to the return code of a child? This is where the wait family of system calls comes in. We will specifically look at one variant, the waitpid() system call.

waitpid() serves two purposes. First, it allows a parent process to wait for a child process to exit. This is useful, for example, when a shell starts a process and wants to wait for it to exit before printing the prompt again. You may have observed that the output from ./forkmyecho is often mixed with the shell prompt – this indicates that one of the processes involved does not wait correctly! Second, waitpid() allows the parent process to read the child's exit status.

Let's look at the example in waitdemo.cc. This program does the following:

int main() {
    fprintf(stderr, "Hello from parent pid %d\n", getpid());

    // Start a child
    pid_t p1 = fork();
    assert(p1 >= 0);
    if (p1 == 0) {
        usleep(500000);
        fprintf(stderr, "Goodbye from child pid %d\n", getpid());
        exit(0);
    }
    double start_time = tstamp();

    // Wait for the child and print its status
    int status;
    pid_t exited_pid = waitpid(p1, &status, 0);
    assert(exited_pid == p1);

    if (WIFEXITED(status)) {
        fprintf(stderr, "Child exited with status %d after %g sec\n",
                WEXITSTATUS(status), tstamp() - start_time);
    } else {
        fprintf(stderr, "Child exited abnormally [%x]\n", status);
    }
}

The interesting line in the program is the call to waitpid() in the parent. waitpid() takes as its first argument the PID of the process to wait for. This must be a direct child of the current process! A process cannot wait for a PID that isn't its child – waitpid() will return an error if you try. The second argument is a pointer to an integer in which the kernel will deposit the child's exit status.

Note the last argument to waitpid(), 0, which tells the system call to block until the child exits. This tells the kernel to make the parent runnable again only once the child has exited.

Blocking vs. polling

Blocking, as opposed to polling, can be a more efficient way to programmatically "wait for things to happen". It is a paradigm we will see over again in the course; and it relies on cooperation between user-space and the kernel. Specifically, many system calls will "block" until a specific condition is true, and it's the kernel's job to refrain from scheduling a process that is blocked.

The effect of the waitpid() system call is that the parent will not print out the "Child exited..." message until after the child exits. The two processes are effectively synchronized in this way.

Zombie processes

When a process forks a child, the child eventually exits and will have a exit status. That exit status needs to be stored somewhere in memory, but the child process's address space has already been destroyed! The responsibility of tracking the exit status falls to the kernel, and means that it needs to keep information about the exited child around until the parent calls waitpid() (or itself exits).

Consequently, the child process then enters a "zombie state" after exiting: the process no longer exits, but the kernel still keeps around its PID and its exit status, waiting for waitpid() to be called on the process. Zombie processes consume kernel resources and we should avoid having zombies lying around whenever possible!

The trick for avoiding zombie processes is to call waitpid() at least once for each child. Invoking waitpid() with -1 as the pid argument will check on an exit status of an arbitrary child.

Limitations of IPC via exit status

Exit detection communicates very little information between processes. It essentially only communicates the exit status of the program exiting. Moreover, the fact that it can only deliver the information after one program has already exited further restricts the types of actions the listening process can take after hearing from the child. Clearly, we would like a richer communication mechanism between processes. Ideally, we would like some sort of channel between two processes that allows them to exchange arbitrary data while they're still running!

Pipes: More Powerful IPC

Processes, though operating in isolated virtual address spaces, often need to communicate with each other in controlled ways. This is what Inter-Process Communication (IPC) is about. We already saw simple examples of IPC in the form of process exit statuses last time. In addition, Linux provides other mechanisms for processes to communicate: they can read and write files on disk, or they can rely on a more efficient, in-memory streaming transport in the form of pipes.

The notion of multiple processes working together to achieve a goal shows up in many ways in your everday computing. Two key examples are:

Pipe Setup

A pipe provides a unidirectional, in-order transport mechanism. A process can use a pipe to send data to itself, but a pipe is most powerful when combined with fork(), which allows it to be used for communication across parent and child processes.

Pipes are created using the pipe() system call. Each pipe has two user-facing file descriptors, corresponding to the read end and the write end of the pipe. File descriptors are identifiers that the kernel uses to keep track of open resources (such as files) used by user-space processes. User-space processes refer to these resources using integer file descriptor (FD) numbers; in the kernel, the FD numbers index into a FD table maintained for each process.

The signature of the pipe() system call looks like this:

int pipe(int pfd[2]);

It is the responsibility of the user-space process to allocate memory for the pfd array. The memory can be in a global variable, on the stack, or on the heap. This follows a general principle with Linux system calls: the memory that the kernel writes data or information to must be allocated by user-space and a valid pointer passed into the kernel.

A successful call to pipe() creates 2 file descriptors, placed in array pfd:

How do I remember which end is which?

The default file descriptors available to each process (stdin, stdout, and stderr) provide a useful mnemonic to remember which end of a pipe is the read end:

Data written to pfd[1] can be read from pfd[0].

Can a pipe be used bi-directionally?

The read end of the pipe can't be written, and the write end of the pipe can't be read. Attempting to read/write to the wrong end of the pipe will result in a system call error (the read() or write() call will return -1).

Let's look at a concrete example in selfpipe.cc:

int main() {
    int pfd[2];
    int r = pipe(pfd);
    assert(r == 0);

    char wbuf[BUFSIZ];
    sprintf(wbuf, "Hello from pid %d\n", getpid());

    ssize_t n = write(pfd[1], wbuf, strlen(wbuf));
    assert(n == (ssize_t) strlen(wbuf));

    char rbuf[BUFSIZ];
    n = read(pfd[0], rbuf, BUFSIZ);
    assert(n >= 0);
    rbuf[n] = 0;

    assert(strcmp(wbuf, rbuf) == 0);
    printf("Wrote %s", wbuf);
    printf("Read %s", rbuf);
}

This code (which doesn't contain fork()) creates a pipe, writes to it, and reads from it within a single process. We create a pipe, write to the pipe, and then read from the pipe. Finally, we assert that the string we get out of the pipe is the same string we wrote into the pipe.

You might wonder where the data goes after the write() system call completes, but before the read() from the pipe is invoked. The data doesn't live in the process's address space! It actually goes into a memory buffer located in the kernel address space.

The read() system call blocks when reading from a stream file descriptor that doesn't have any data to be read. Pipe file descriptors are stream file descriptors, so reading from an empty pipe will block (meaning the kernel won't schedule the reading process until there is data available in the pipe). write() calls to a pipe block when the buffer is full (because reader the not consuming quickly enough). A read() from a pipe returns EOF if all write ends of a pipe is closed.

So far we've only seen pipe functioning within the same process. Since the pipe lives in the kernel, it can also be used to pass data between processes. We can combine the fork() system call (which copies the parent process's open file descriptors to the child process) with pipes to establish a communication channel across processes.

Pipes across processes

Since the pipe lives in the kernel, it can also be used to pass data between processes. Let's take a look at childpipe.cc as an example:

int main() {
    int pipefd[2];
    int r = pipe(pipefd);
    assert(r == 0);

    pid_t p1 = fork();
    assert(p1 >= 0);

    if (p1 == 0) {
        // child process
        const char* message = "Hello, mama!";
        printf("[child: %d] sending message\n", getpid());
        ssize_t nw = write(pipefd[1], message, strlen(message));
        assert(nw == (ssize_t) strlen(message));
        exit(0);
    }

    char buf[BUFSIZ];
    if (read(pipefd[0], buf, BUFSIZ) > 0) {
        printf("[parent: %d] I got a message! It was ā€œ%sā€\n", getpid(), buf);
    }

    close(pipefd[0]);
    close(pipefd[1]);
}

Here, we use fork() to create a child process, but note that before forking we created a pipe first! The fork() duplicates the two pipe file descriptors in the child, but note that the pipe itself is not duplicated (because the pipe doesn't live in the process's address space). The child then writes a message to the pipe, and the same message can be read from the parent. Interprocess communication!

Note that in the scenario above we have four file descriptors associated with the pipe, because fork() duplicates the file descriptors corresponding to two ends of a pipe. The pipe in this case has two read ends and two write ends. The animated picture below (courtesy of Eddie Kohler) illustrates the situation in terms of the FD table maintained for both processes:

pipe-fork

Note that there continues to exist a write end of the pipe in the parent, and a read end in the child that never get closed! This can lead to weird hangs when using pipes, so a common idiom is to close the pipe ends that a process does not need immediately after forking. In the example, we need to close the write end in the parent and the read end in the child:

    ...
    pid_t p1 = fork();
    assert(p1 >= 0);

    if (p1 == 0) {
       ... // child code
    }

    close(pipefd[1]); // close the write end in the parent
    char buf[BUFSIZ];
    if (read(pipefd[0], buf, BUFSIZ) > 0) {
    ...

Are pipes always 1:1 between a parent and a child process?

Pipes are quite powerful! You can also create a pipe between two child processes (think about how you would use fork(), pipe(), and close() to achieve this!), and a pipe can actually have multiple active read ends and write ends. A use case for a pipe with multiple active write ends is a multiple-producer, single-consumer (MPSC) setup – this is useful, e.g., when a process listens for messages from a set of other processes. Likewise, a pipe with multiple active read ends supports a single-producer, multiple consumer (SPMC) use case, where multiple readers compete for messages (e.g., work units) sent by a writer. Multiple producer, multiple consumer (MPMC) is also possible, but rarely used in practice because it becomes difficult to coordinate the readers and writers.

Summary

Today, we started breaking down the absolute isolation of processes by introducing mechanisms for them to communicate via inter-process communication abstractions. One simple, one-shot communication mechanism is the exit status that a parent process can read when it uses waitpid() after it waited for a child process to exit. Because that exit status needs to be available even after the child process has exited, the OS kernel will keep processes around as "zombie processes" until the parent has called waitpid() on them and retrieved (or ignored) the exit status.

We then dove deeper into how processes can interact via pipes, which are kernel-mediated shared-memory buffers that processes access via file descriptors. We saw that combining pipes with forking allows processes to establish communication channels with (and even between) their children. This is a very powerful abstraction that allows us to chain shell commands, and to achieve parallelism using multiple processes on multi-processor computers.