Lecture 15: Processes, continued
» Lecture video (Brown ID required)
» Lecture code
» Post-Lecture Quiz (due 11:59pm Monday, April 6)
Processes
Process creation via fork()
(recap)
To create a new process, a user-space process calls the fork()
system call. Fork has the effect of
cloning the process and continuing execution in both the parent process and its new child process.
Since the child process receives a full copy of the parent process's address space, any virtual address that was mapped and valid in the parent is also valid in the child process. However, the same virtual address is backed by a different physical address in the child. In other words, parent memory and child memory are entirely independent.
Let's do a quick exercise to remind us of what fork()
does. Take a look at this program:
int main() {
printf("Hello from initial pid %d\n", getpid());
pid_t p1 = fork();
assert(p1 >= 0);
pid_t p2 = fork();
assert(p2 >= 0);
printf("Hello from final pid %d\n", getpid());
}
Question: How many lines of output would you expect to see when you run the program?
5 lines. The first
printf()
prints one line, only in the parent, and then the secondprintf()
will run four times, one in each process (parent + 2 children + 1 grand child).
Running a different process
If we just had fork()
, we would only be able to execute copies of a single user-space
process. But in reality, we want to be able to start other programs from a user-space process. One key
example of a program that does this is your shell: when you type a command like ./myprogram
into
the terminal, the shell executes myprogram
.
There are different ways to achieve this goal, some involving fork()
. The debate over which way
is best still rages today.
The UNIX way: fork
-and-exec
style
There is a family of system calls in UNIX that executes a new program. The system call we will discuss here
is execv()
. At some point you may want to use other system calls in the exec
syscall
family – you can use man exec
to find more information about them. They differ primarily in
how they except their arguments to be passed.
The execv
system call (and all system calls in the exec
family) performs the
following:
- Blow away the current process's virtual address space.
- Begin executing the specified program in the current process, starting from its
main()
function.
Note that execv
does not "spawn" a process. It destroys the current
process and replaces it. Therefore, it's very common to use execv
in conjunction with
fork
: we first call fork()
to create a child process, and then call execv()
to run a new program inside the child process, replacing the "process image" that fork()
copied.
Let's look at the program in myecho.cc
:
int main(int argc, char* argv[]) {
fprintf(stderr, "Myecho running in pid %d\n", getpid());
for (int i = 0; i != argc; ++i) {
fprintf(stderr, "Arg %d: \"%s\"\n", i, argv[i]);
}
}
It's a simple program that prints out its pid
and content in its argv[]
.
We will now run this program using the execv()
system call. The "launcher"
program where we call execv
is in forkmyecho.cc
:
int main() {
const char* args[] = {
"./myecho", // argv[0] is the string used to execute the program
"Hello!",
"Myecho should print these",
"arguments.",
nullptr
};
pid_t p = fork();
if (p == 0) {
fprintf(stderr, "About to exec myecho from pid %d\n", getpid());
int r = execv("./myecho", (char**) args);
fprintf(stderr, "Finished execing myecho from pid %d; status %d\n",
getpid(), r);
} else {
fprintf(stderr, "Child pid %d should exec myecho\n", p);
}
}
The goal of the launcher program is to run myecho
with the arguments shown
in the args[]
array. We need to pass these arguments to the execv
system
call. In the child process created by fork()
we call execv
to run the
myecho
program.
Terminating the argument array correctly
execv
andexecvp
system calls take an array of C strings as the second parameter, which are arguments to run the specified program with. Note that everything here is in C: the array is a C array, and the strings are C strings. The array must be terminated by anullptr
(orNULL
) as a C array contains no length information.
Running forkecho
gives us outputs like the following:
Child pid 1440 should exec myecho About to exec myecho from pid 1440 $ Myecho running in pid 1440 Arg 0: "./myecho" Arg 1: "Hello!" Arg 2: "Myecho should print these" Arg 3: "arguments."
Notice that the line "Finished execing myecho from pid..." never gets
printed! This is the case because the fprintf
call printing this message comes after the
execv
system call. If the execv
call is successful, the process's address
space at the time of the call gets blown away (including the stack), so anything after
execv
won't execute at all. Another way to think about it is that if the execv
system call succeeds, then the system call never returns. (Note though, that exec
does
return if it fails – it's not correct to write code that assumes that it never returns!)
The picture below summarizes what happened here, with the forkmyecho
child process in
green and the myecho
child process in
blue. (The red
waitpid()
part is explained further down.)
Note that there are three processes in total involved here: P1 is the original shell process
running in your terminal, P2 is a child it forks, which then gets replaced by
forkmyecho
, and P3 is the process that ultimately runs myecho
.
Alternative interface: posix_spawn
Calling fork()
and execv()
in succession to run a process may appear
counter-intuitive and even inefficient. Imagine a complex program with gigabytes of virtual address space
mapped and it wants to creates a new process. What's the point of copying the big virtual address space of
the current program if all we are going to do is just to throw everything away and start anew?
These are valid concerns regarding the UNIX style of process management. Modern Linux systems provide
an alternative system call, called posix_spawn()
, which creates a new process without copying
the address space or destroying the current process. A new program gets "spawned" in a new
process and the pid
of the new process is returned via one of the pointer arguments.
Non-UNIX operating systems like Windows also uses this style of process creation.
The program in spawnmyecho.cc
shows how to use the alternative
interface to run a new program:
int main() {
const char* args[] = {
"./myecho", // argv[0] is the string used to execute the program
"Hello!",
"Myecho should print these",
"arguments.",
nullptr
};
fprintf(stderr, "About to spawn myecho from pid %d\n", getpid());
pid_t p;
int r = posix_spawn(&p, "./myecho", nullptr, nullptr,
(char**) args, nullptr);
assert(r == 0);
fprintf(stderr, "Child pid %d should run myecho\n", p);
}
Note that posix_spawn()
takes many more arguments than execv()
. This has
something to do with the managing the environment within which the new process to be run.
In the fork-and-exec style of process creation, fork()
copies the current
process's environment, and execv()
preserves the environment. The explicit
gap between fork()
and execv()
provides us a natural window where we can
set up and tweak the environment for the child process as needed, using the parent process's environment
as a starting point.
With an interface like posix_spawn()
, however, we need to supply more information directly
to the system call itself. Take a look at posix_spawn
's manual page to find out what these
extra nullptr
arguments are about – they are quite complicated. This teaches an
interesting lesson in API design: performance and usability of an API, in many cases, are often a trade-off.
Why do we still have
fork()
?The debate of which style of process creation is better has never settled. Modern UNIX operating systems inherited the fork-and-exec style from the original 1970s UNIX, where
fork()
turned out extremely easy to implement. Modern UNIX systems can executefork()
very efficiently without actually performing any substantial copying (using copy-on-write optimization) until necessary. For these reasons, in practice, the performance of the fork-and-exec style is not a common concern.
Running execv()
without fork()
You might wonder what happens if we don't fork and just run execv
. Let's take a look at
runmyecho.cc
:
int main() {
const char* args[] = {
"./myecho", // argv[0] is the string used to execute the program
"Hello!",
"Myecho should print these",
"arguments.",
nullptr
};
fprintf(stderr, "About to exec myecho from pid %d\n", getpid());
int r = execv("./myecho", (char**) args);
fprintf(stderr, "Finished execing myecho from pid %d; status %d\n",
getpid(), r);
}
This program now invokes execv()
directly, without fork
-ing a child first.
The new program (myecho
) will print out the same pid
as the original
process. execv()
blows away the old process's image (including code, global variables, heap,
and stack), but it does not change the pid
, because no new processes gets created. The new
program runs inside the same process after the old program gets destroyed.
The picture below contrasts execution with fork()
(left side) and with just
execv()
(right side):
Observe that if your shell was to just call execv()
, it could only ever run a single
command that would never return!
Inter-process Communication
We want processes to be isolated – that is why we introduced virtual memory after all! But sometimes, they need to convey information to each other. For example, you may want to accelerate a large task by dividing the work between multiple processes (which can run in parallel on multiple processors in your computer) and have the child processes send the results back to the parent process. With the super-strict isolation that we've created, this is not possible!
Fortunately, there are a range of mechanisms for processes to communicate with each other in safe ways, typically mediated by the kernel. These ways of communicating, and the abstractions that implement them are called inter-process communication.
Waiting for a process to exit
The simplest form of IPC is the exit status of a process, which gives it a one-off opportunity to send a single integer to the parent process. Processes on Linux often use this integer to indicate whether they exited successfully (a zero exit status) or whether an error occurred (a non-zero exit status, often a negative one).
But how does the parent process get access to the return code of a child? This is where the wait
family
of system calls comes in. We will specifically look at one variant, the waitpid()
system call.
waitpid()
serves two purposes. First, it allows a parent process to wait for a child process to exit.
This is useful, for example, when a shell starts a process and wants to wait for it to exit before printing the prompt
again. You may have observed that the output from ./forkmyecho
is often mixed with the shell prompt –
this indicates that one of the processes involved does not wait correctly! Second, waitpid()
allows the
parent process to read the child's exit status.
Let's look at the example in waitdemo.cc
. This program does the following:
- It creates a child process.
- The child process sleeps for half a second, prints out a message, and exits.
- The parent waits for the child to finish, and prints out a message based on the child's exit status.
int main() {
fprintf(stderr, "Hello from parent pid %d\n", getpid());
// Start a child
pid_t p1 = fork();
assert(p1 >= 0);
if (p1 == 0) {
usleep(500000);
fprintf(stderr, "Goodbye from child pid %d\n", getpid());
exit(0);
}
double start_time = tstamp();
// Wait for the child and print its status
int status;
pid_t exited_pid = waitpid(p1, &status, 0);
assert(exited_pid == p1);
if (WIFEXITED(status)) {
fprintf(stderr, "Child exited with status %d after %g sec\n",
WEXITSTATUS(status), tstamp() - start_time);
} else {
fprintf(stderr, "Child exited abnormally [%x]\n", status);
}
}
The interesting line in the program is the call to waitpid()
in the parent.
waitpid()
takes as its first argument the PID of the process to wait for.
This must be a direct child of the current process! A process cannot wait for a PID that
isn't its child – waitpid()
will return an error if you try. The second
argument is a pointer to an integer in which the kernel will deposit the child's exit status.
Note the last argument to waitpid()
, 0, which tells the system call to
block until the child exits. This tells the kernel to make the parent runnable again
only once the child has exited.
Blocking vs. polling
Blocking, as opposed to polling, can be a more efficient way to programmatically "wait for things to happen". It is a paradigm we will see over again in the course; and it relies on cooperation between user-space and the kernel. Specifically, many system calls will "block" until a specific condition is true, and it's the kernel's job to refrain from scheduling a process that is blocked.
The effect of the waitpid()
system call is that the parent will not print
out the "Child exited..." message until after the child exits. The two
processes are effectively synchronized in this way.
Zombie processes
When a process forks a child, the child eventually exits and will have a exit status. That exit
status needs to be stored somewhere in memory, but the child process's address space has already
been destroyed! The responsibility of tracking the exit status falls to the kernel, and means that
it needs to keep information about the exited child around until the parent calls waitpid()
(or itself exits).
Consequently, the child process then enters a "zombie state" after exiting: the process no
longer exits, but the kernel still keeps around its PID and its exit status, waiting for waitpid()
to be called on the process. Zombie processes consume kernel resources and we should avoid having zombies
lying around whenever possible!
The trick for avoiding zombie processes is to call waitpid()
at least once for each child.
Invoking waitpid()
with -1
as the pid
argument will check on an exit
status of an arbitrary child.
Limitations of IPC via exit status
Exit detection communicates very little information between processes. It essentially only communicates the exit status of the program exiting. Moreover, the fact that it can only deliver the information after one program has already exited further restricts the types of actions the listening process can take after hearing from the child. Clearly, we would like a richer communication mechanism between processes. Ideally, we would like some sort of channel between two processes that allows them to exchange arbitrary data while they're still running!
Pipes
Linux provides a stream communication mechanism called "pipes" that allows processes to exchange data in a unidirectional, ordered manner.
Each pipe has a read end and a write end, which correspond to two file descriptors.
Pipes can be created using the pipe()
system call. The signature of the pipe()
system call looks like this:
int pipe(int pfd[2]);
A successful invocation of pipe()
creates two file descriptors, placed in array pfd
:
pfd[0]
: read end of the pipepfd[1]
: write end of the pipe
pipe()
. However,
we can combine pipe()
with fork()
to make file descriptors available to child processes.
(Recall that child processes inherit the parent's open files and resources on fork()
.) We'll look into
this more in the next lecture.
Summary
Today, we looked at process creation via fork()
and execv()
, as well as different
interfaces that people have advocated for the common use case of creating a new process that runs a different
program than the parent process. Composing fork()
and execv()
allows for a process to
start another program, and gives us the basic building blocks to make, e.g., a shell.
We also started breaking down the absolute isolation of processes by introducing mechanisms for them to
communicate via inter-process communication abstractions. One simple, one-shot communication mechanism is the
exit status that a parent process can read when it uses waitpid()
after it waited for a child
process to exit. Because that exit status needs to be available even after the child process has exited, the OS
kernel will keep processes around as "zombie processes" until the parent has called waitpid()
on them and retrieved (or ignored) the exit status.
Pipes provide a more flexible IPC mechanism that allows arbitrary data to be sent between processes at runtime. We will explore more details of pipes next time.