⚠️ This is not the current iteration of the course! Head here for the current offering.

Exercises: Concurrency

These exercises will help you prepare for quiz questions you may see for Block 3: Concurrency and Parallel Programming.

Acknowledgements: These exercises were originally developed for Harvard's CS 61 course and were kindly shared by Eddie Kohler.

CONC-1. Processes

Consider the two programs shown below.

// Program 1
#include <cstdio>
#include <unistd.h>
int main() {
    printf("PID %d running prog1\n", getpid());
}
// Program 2
#include <cstdio>
#include <unistd.h>
int main() {
    char* argv[2];
    argv[0] = (char*) "prog1";
    argv[1] = nullptr;
    printf("PID %d running prog2\n", getpid());
    int r = execv("./prog1", argv);
    printf("PID %d exiting from prog2\n", getpid());
}

QUESTION CONC-1A. How many different PIDs will print out if you run Program 2?

QUESTION CONC-1B. How many lines of output will you see?

Now, let's assume that we change Program 2 to the following:

// Program 2B
#include <cstdio>
#include <unistd.h>
int main() {
    char* argv[2];
    argv[0] = (char*) "prog1";
    argv[1] = nullptr;

    printf("PID %d running prog2\n", getpid());
    pid_t p = fork();
    if (p == 0) {
        int r = execv("./prog1", argv);
    } else {
        printf("PID %d exiting from prog2\n", getpid());
    }
}

QUESTION CONC-1C. How many different PIDs will print out if you run Program 2B?

QUESTION CONC-1D. How many lines of output will you see?

Finally, consider this version of Program 2.

// Program 2C
#include <cstdio>
#include <unistd.h>
int main() {
    char* argv[2];
    argv[0] = (char*) "prog1";
    argv[1] = nullptr;

    printf("PID %d running prog2\n", getpid());
    pid_t p = fork();
    pid_t q = fork();
    if (p == 0 || q == 0) {
        int r = execv("./prog1", argv);
    } else {
        printf("PID %d exiting from prog2\n", getpid());
    }
}

QUESTION CONC-1E. How many different PIDs will print out if you run Program 2C?

QUESTION CONC-1F. How many lines of output will you see?

CONC-2. Pipes and synchronization

In the following questions, you will implement a mutex using a pipe, and a limited type of pipe using a mutex.

QUESTION CONC-2A. In this question, you are to implement mutex functionality using a pipe. Fill in the definitions of the mutex operations. You may assume that no errors occur.

struct pipe_mutex {
    int fd[2];
    // YOUR CODE HERE (if necessary)

    pipe_mutex() {
        int r = pipe(this->fd);
        assert(r == 0);
        // YOUR CODE HERE (if necessary)
    }

    void lock() {
        // YOUR CODE HERE
    }

    void unlock() {
        // YOUR CODE HERE
    }
};

In the next questions, you will help implement pipe functionality using an in-memory buffer and a mutex. This “mutex pipe” will only work between threads of the same process (in contrast to a regular pipe, which also works between processes). An initial implementation of mutex pipes is as follows; you will note that it contains no mutexes.

        struct mutex_pipe {
/* 1*/      char bbuf_[BUFSIZ];
/* 2*/      size_t bpos_;
/* 3*/      size_t blen_;

            mutex_pipe() {
/* 4*/          this->bpos_ = this->blen_ = 0;
/* 5*/          memset(this->bbuf_, 0, BUFSIZ);
            }

            // Read up to `sz` bytes from this mutex_pipe into `buf` and return the number of bytes
            // read. If no bytes are available, wait until at least one byte can be read.
            ssize_t read(char* buf, size_t sz) {
/* 6*/           size_t pos = 0;
/* 7*/           while (pos < sz && (pos == 0 || this->blen_ != 0)) {
/* 8*/               if (this->blen_ != 0) {
/* 9*/                   buf[pos] = this->bbuf_[this->bpos_];
/*10*/                   ++this->bpos_;
/*11*/                   this->bpos_ = this->bpos_ % BUFSIZ;
/*12*/                   --this->blen_;
/*13*/                   ++pos;
/*14*/               }
/*15*/           }
/*16*/           return pos;
            }

            // Write up to `sz` bytes from `buf` into this mutex_pipe and return the number of bytes
            // written. If no space is available, wait until at least one byte can be written.
            ssize_t write(const char* buf, size_t sz) {
/*17*/          size_t pos = 0;
/*18*/          while (pos < sz && (pos == 0 || this->blen_ < BUFSIZ)) {
/*19*/              if (this->blen_ != BUFSIZ) {
/*20*/                  size_t bindex = this->bpos_ + this->blen_;
/*21*/                  bindex = bindex % BUFSIZ;
/*22*/                  this->bbuf_[bindex] = buf[pos];
/*23*/                  ++this->blen_;
/*24*/                  ++pos;
/*25*/              }
/*26*/          }
/*27*/          return pos;
            }
        };

QUESTION CONC-2B. What’s another name for this data structure?

It would be wise to work through an example. For example, assume BUFSIZ == 4, and figure out how the following calls would behave.

mutex_pipe mp;
mp.write("Hi", 2);
mp.read(buf, 4);
mp.write("Test", 4);
mp.read(buf, 3);

First let’s reason about this code in the absence of threads.

QUESTION CONC-2C. Which of the following changes could, if made in isolation, result in undefined behavior when a mutex pipe was used? Circle all that apply.

  1. Removing line 4
  2. Removing line 5
  3. Removing “|| this->blen_ < BUFSIZ” from line 18
  4. Removing line 21
  5. Removing lines 23 and 24

QUESTION CONC-2D. Which of the following changes could, if made in isolation, cause a mutex_pipe::read to return incorrect data (that is, the byte sequence produced by read will not equal the byte sequence passed to write)? Circle all that apply.

  1. Removing line 4
  2. Removing line 5
  3. Removing “|| this->blen_ < BUFSIZ” from line 18
  4. Removing line 21
  5. Removing lines 23 and 24

QUESTION CONC-2E. Which of the following changes could, if made in isolation, cause a call to mutex_pipe::write to never return (when a correct implementation would return)? Circle all that apply.

  1. Removing line 4
  2. Removing line 5
  3. Removing “|| this->blen_ < BUFSIZ” from line 18
  4. Removing line 21
  5. Removing lines 23 and 24

QUESTION CONC-2F. Write an invariant for a mutex_pipe’s blen_ member. An invariant is a statement about the value of blen_ that is always true. Write your invariant in the form of an assertion; for full credit give the most specific true invariant you can. (“blen_ is a unsigned integer” is unspecific, but true; “blen_ == 4” is specific, but false.)

QUESTION CONC-2G. Write an invariant for bpos_. For full credit give the most specific true invariant you can.


In the remaining questions, you will add synchronization objects and operations to make your mutex pipe work in a multithreaded program.

QUESTION CONC-2H. Add a std::mutex to the mutex_pipe and use it to protect the mutex pipe from race condition bugs. For full credit, your solution must not deadlock—if one thread is reading from a pipe and another thread is writing to the pipe, then both threads must eventually make progress. Describe all changes required, with reference to specific line numbers.

QUESTION CONC-2I. Your solution to the last question likely has poor utilization. For instance, a thread that calls mutex_pipe on an empty mutex pipe will spin forever, rather than block. Introduce one or more condition variables so that read will block until data is available. Write one or more snippets of C code and give line numbers after which the snippets should appear.

CONC-3. Minimal minimal minimal synchronization synchronization synchronization

Minimalist composer Philip Glass, who prefers everything minimal, proposes the following implementation of condition variables based on mutexes. He’s only implementing wait and notify_one at first.

struct pg_condition_variable {
    std::mutex cvm;

    pg_condition_variable() {
        // start the mutex in LOCKED state!
        this->cvm.lock();
    }

    void wait(std::mutex& m) {
        m.unlock();
        this->cvm.lock();   // will block until a thread calls `notify_one`
        m.lock();
    }

    void notify_one() {
        this->cvm.unlock();
    }
};

Philip wants to use his condition variables to build a bank, where accounts support these operations:

Here’s Philip’s code.

      struct pg_acct {
          unsigned long balance;
          std::mutex m;
          pg_condition_variable cv;

          void deposit(unsigned amt) {
/*D1*/        this->m.lock();
/*D2*/        this->balance += amt;
/*D2*/        this->cv.notify_one();
/*D2*/        this->m.unlock();
          }

          void withdraw(unsigned amt) {
/*W1*/        this->m.lock();
/*W2*/        while (this->balance < amt) {
/*W3*/            this->cv.wait(this->m);
/*W4*/        }
/*W5*/        this->balance -= amt;
/*W6*/        this->m.unlock();
          }
      };

Philip’s friend Pauline Oliveros just shakes her head. “You got serious problems,” she says, pointing at this section of the C++ standard:

The expression m.unlock() shall…have the following semantics:

Requires: The calling thread shall own the mutex.

This means that the when m.unlock() is called, the calling thread must have previously locked the mutex, with no intervening unlocks. The penalty for deviance is undefined behavior.

QUESTION CONC-3A. Briefly explain how Philip’s code can trigger undefined behavior.


To fix this problem, Philip changes his condition variable and account to use a new type, fast_mutex, instead of std::mutex. This type is inspired by Linux’s “fast” mutexes. It’s OK to unlock a fast_mutex more than once, and it’s OK to unlock a fast_mutex on a different thread than the thread that locked it.

A fast_mutex has one important member, value, which can be 0 (unlocked) or 1 (locked).

Below, we've begun to write out an execution where Philip’s code is called by two threads. We write the line numbers each thread executes and the values in a after each line. We’ve left lines blank for you to fill in if you need to.

T1 T2 a.balance a.m.value a.cv.cvm.value
Initial state: 5 0 1
a.deposit(10) a.withdraw(12)
after W1 5 1 1
after D1 5 1 1
(T1 blocks on a.m)

QUESTION CONC-3B. Assuming T2’s call to withdraw eventually completes, what are the final values for a.balance, a.m.value, and a.cv.cvm.value?

QUESTION CONC-3C. In such an execution, which line of code (W1–5) unblocks Thread T1?

QUESTION CONC-3D. In such an execution, which, if any, line(s) of code (D1–4 and/or W1–5) set a->cv.cvm.value to zero?


QUESTION CONC-3E. For any collection of deposit and withdraw calls, Philip’s code will always ensure that the balance is valid. (There are other problems—a withdraw call might block forever when it shouldn’t—but the balance will be OK.) Why? List all that apply.

  1. Access to balance is protected by a condition variable.
  2. Access to balance is protected by a mutex.
  3. Arithmetic expressions like this->balance += amt; have atomic effect.

CONC-4. Weensy threads

Betsy Ross is changing her WeensyOS to support threads. There are many ways to implement threads, but Betsy wants to implement threads using the ptable array. “After all,” she says, “a thread is just like a process, except it shares the address space of some other process!”

Betsy has defined a new system call, sys_create_thread, that starts a new thread running a given thread function, with a given argument, and a given initial stack pointer:

typedef void* (*thread_function)(void*);
pid_t sys_create_thread(thread_function f, void* arg, void* stack_ptr);

The system call’s return value is the ID of the new thread, which Betsy thinks should use the process ID space.

Betsy’s kernel contains the following code:

// in syscall()
case SYSCALL_FORK:
    return handle_fork(current);

case SYSCALL_CREATE_THREAD:
    return handle_create_thread(current);


uint64_t handle_fork(proc* p) {
    // Find a free process; return `nullptr` if all out
    proc* np = find_free_process();
    if (!np) {
        return -1;
    }

    // Copy the input page table and allocate new pages using `vmiter`
    np->pagetable = copy_pagetable(p->pagetable);
    if (!np->pagetable) {
        return -1;
    }

    // Finish up
    np->regs = p->regs;
    np->regs.reg_rax = 0;
    np->state = P_RUNNABLE;
    return np->pid;
}

uint64_t handle_create_thread(proc* p) {
    // Whoops! Got a revolution to run, back later
    return -1;
}

QUESTION CONC-4A. Complete her handle_create_thread implementation. Assume for now that the thread function never exits, and don’t worry about reference counting issues (for page tables, for instance). You may use the helper functions shown above, including find_free_process and copy_pagetable, if you need them; or you may use any functions or objects from the WeensyOS handout code, including vmiter and kalloc.

Recall that system call arguments are passed according to the x86-64 calling convention: first argument in %rdi, second in %rsi, third in %rdx, etc.

QUESTION CONC-4B. Betsy’s friend Prince Dimitri Galitzin thinks Betsy should give processes even more flexibility. He suggests that the sys_create_thread system call should take a full set of registers (as x86_64_registers*), rather than just a new instruction pointer and a new stack pointer. That way, the creating thread can supply all registers to the new thread. But Betsy points out that this design would allow a thread to violate kernel isolation by providing carefully-planned register values for x86_64_registers.

Which x86-64 registers could be used in Dimitri’s design to violate kernel isolation? List all that apply.

  1. reg_rax, which contains the thread’s %rax register.
  2. reg_rip, which contains the thread’s instruction pointer.
  3. reg_cs, which contains the thread’s privilege level, which is 3 for unprivileged.
  4. reg_rflags, which contains the EFLAGS_IF flag, which indicates that the thread runs with interrupts enabled.
  5. reg_rsp, which contains the thread’s stack pointer.

Now Betsy wants to handle thread exit. She introduces two new system calls, sys_exit_thread and sys_join_thread:

void sys_exit_thread(void* exit_value);
void* sys_join_thread(pid_t thread);

sys_exit_thread causes the calling thread to exit with the given exit value. sys_join_thread behaves like pthread_join or waitpid. If thread corresponds is a thread of the same process, and thread has exited, sys_join_thread cleans up the thread and returns its exit value; otherwise, sys_join_thread returns (void*) -1.

(The exit_value feature differs from C++ threads, which don’t have exit values.)

QUESTION CONC-4C. Is the sys_join_thread specification blocking or polling?

Betsy makes the following changes to WeensyOS internal structures to support thread exit.

  1. She adds a void* p_exit_value member to struct proc.
  2. She adds a new process state, P_EXITED, that corresponds to exited threads.

QUESTION CONC-4D. Complete the case for SYSCALL_EXIT_THREAD. (Don’t worry about the last thread in a process; you may assume it always calls sys_exit rather than sys_exit_thread.)

case SYSCALL_EXIT_THREAD:

QUESTION CONC-4E. Complete the following helper function.

// Test whether `test_pid` is the PID of a thread in the same process as `p`.
// Return 1 if it is; return 0 if `test_pid` is an illegal PID, it corresponds to
// a freed process, or it corresponds to a thread in a different process.
int is_thread_in(pid_t test_pid, proc* p) {

QUESTION CONC-4F. Complete the case for SYSCALL_JOIN_THREAD in syscall(). Remember that a thread may be successfully joined at most once: after it is joined, its PID is made available for reallocation.

case SYSCALL_JOIN_THREAD:

QUESTION CONC-4G. Advanced extra credit. In Weensy threads, if a thread returns from its thread function, it will execute random code, depending on what random garbage was stored in its initial stack in the return address position. But Betsy thinks she can implement better behavior entirely at user level, where the value returned from the thread function will automatically be passed to sys_thread_exit. She wants to make two changes:

  1. She’ll write a two- or three-instruction function called thread_exit_vector.
  2. Her create_thread library function will write a single 8-byte value to the thread’s new stack before calling sys_create_thread.

Explain how this will work. What instructions will thread_exit_vector contain? What 8-byte value will create_thread write to the thread’s new stack? And where will that value be written relative to sys_create_thread’s stack_ptr argument?

CONC-5. Fair synchronization

C++ standard mutexes are unfair: some threads might succeed in locking the mutex more often than others. For example, a simple experiment on Linux shows that if threads repeatedly try to lock the same mutex, some threads lock the mutex 1.13x more often than others. (Other kinds of lock are even less fair: some threads can lock a spinlock 3.91x more often than others!)

QUESTION CONC-5A. What is the name of the problem that would occur if one particular thread never locked the mutex, even though other threads locked and unlocked the mutex infinitely often?

To avoid unfairness, threads must take turns. One fair mutex implementation is called a ticket lock; this (incorrect, unsynchronized) code shows the basic idea.

struct ticket_mutex {
    unsigned now = 0;   // “now serving”
    unsigned next = 0;  // “next ticket”

    void lock() {
        unsigned t = this->next;    // mark this thread’s place in line
        ++this->next;               // next thread gets new place
        while (this->now != t) {    // wait for my turn
        }
    }

    void unlock() {
        ++this->now;
    }
};

QUESTION CONC-5B. Describe an instance of undefined behavior that could occur if multiple threads called ticket_mutex::lock on the same ticket mutex at the same time.

QUESTION CONC-5C. Fix lock and unlock using C++ atomics. Alternately, for partial credit, say which regions of code must be executed atomically.


The ticket lock implementation above uses polling. That will perform well if critical sections are short, but blocking is preferable if critical sections are long. Here’s a different ticket lock implementation:

      struct ticket_mutex {
/*T1*/    unsigned now;
/*T2*/    unsigned next;
/*T3*/    std::mutex m;

          void lock() {
/*L1*/        this->m.lock();
/*L2*/        unsigned t = this->next++;
/*L3*/        while (this->now != t) {
/*L4*/            this->m.unlock();
/*L5*/            sched_yield();
/*L6*/            this->m.lock();
/*L7*/        }
/*L8*/        this->m.unlock();
          }

          void unlock() {
/*U1*/        this->m.lock();
/*U2*/        ++this->now;
/*U3*/        this->m.unlock();
          }
      };

This ticket lock implementation uses std::mutex, which blocks, but the implementation itself uses polling.

QUESTION CONC-5D. Which line or lines of code mark this implementation as using polling?

QUESTION CONC-5E. Change the implementation to truly block. Include line numbers indicating where your code will go.

QUESTION CONC-5F. Most solutions to part E wake up blocked threads more than is strictly necessary. The ideal number of blocking calls is one: each thread should block at most once and wake up only when its turn comes. But the simplest correct solution will wake up each blocked thread a number of times proportional to the number of blocked threads.

Change your solution so that when there are 4 or fewer threads, every thread wakes up only when its turn comes. (Your solution must, of course, work correctly for any number of threads.) If your solution already works this way, you need not do anything here.