CSCI 0300/1310: Fundamentals of Computer Systems

Lecture 25: Condition Variables, Distributed Systems, Sharding

» Lecture video (Brown ID required)
» Lecture code
» Post-Lecture Quiz (due 11:59pm Monday, May 6)

Condition Variables

A condition variable is another synchronization object. It is useful when you have a situation in a program where a thread holds a lock, but cannot proceed because some condition needs to become true before it can make progress: for example, in our bounded buffer program, a reader may wait for the buffer to become non-empty or a writer may wait for the buffer to become non-full. Importantly, it is necessary for the waiting thread to give up its lock so that other threads can change the condition. This is what a condition variable is for. It supports the following operations:

wait(std::unique_lock& lock): In one atomic step, it unlocks the lock, and blocks until another thread calls notify_all(). It also relocks the lock before returning (waking up).
notify_all(): Wakes up all threads blocked by calling wait().

C++ Mutex Patterns

Before we talk more about condition variables, let's look at a C++ pattern that both makes working with mutexes easier and forms the basis for the standard library condition variable API.

It is very common that we write some synchronized function where we need to lock the mutex first, do some work, and then unlock the mutex before the function returns. Doing this repeatedly can be tedious. Also, if the function can return at multiple points, it is possible to forget a unlock statement before a return, resulting errors. C++ has a pattern to help us deal with these problems and simplify programming: a scoped lock.

We use scoped locks to simplify programming of the bounded buffer in bbuffer-scoped.cc. The write() method now looks like the following:

ssize_t bbuffer::write(const char* buf, size_t sz) {
    std::unique_lock guard(this->mutex_);
    assert(!this->write_closed_);
    size_t pos = 0;
    while (pos < sz && this->blen_ < bcapacity) {
        size_t bindex = (this->bpos_ + this->blen_) % bcapacity;
        this->bbuf_[bindex] = buf[pos];
        ++this->blen_;
        ++pos;
    }
    ...
}

Note that in the first line of the function body we declared a std::unique_lock object, which is a scoped lock that locks the mutex for the scope of the function. Upon initialization of the std::unique_lock object, the mutex is automatically locked, and when this object goes out of scope, the mutex is automatically unlocked. These special scoped lock objects lock and unlock mutexes in their constructors and destructors (a special C++ method on classes that gets invoked before an object of that class is destroyed) to achieve this effect.

This design pattern is also called Resource Acquisition is Initialization (or RAII), and is a common pattern in software engineering in general. The use of RAII simplify coding and also avoids certain programming errors.

Condition Variables, for real

A condition variable supports the following operations:

wait(std::unique_lock& lock): In one atomic step, it unlocks the lock, blocks until another thread calls notify_all(). It also relocks the lock before returning (waking up).
notify_all(): Wakes up all threads blocked by calling wait().

Logically, the writer to the bounded buffer should block when the buffer becomes full, and should unblock when the buffer becomes nonfull again. Let's create a condition variable, called nonfull_, in the bounded buffer, just under the mutex. Note that we conveniently named the condition variable after the condition under which the function should unblock. It will make code easier to read later on. The write() method implements blocking is in bbuffer-cond.cc. It looks like the following:

ssize_t bbuffer::write(const char* buf, size_t sz) {
    std::unique_lock guard(this->mutex_);
    assert(!this->write_closed_);
    while (this->blen_ == bcapacity) {  // #1
        this->nonfull_.wait(guard);
    }
    size_t pos = 0;
    while (pos < sz && this->blen_ < bcapacity) {
        size_t bindex = (this->bpos_ + this->blen_) % bcapacity;
        this->bbuf_[bindex] = buf[pos];
        ++this->blen_;
        ++pos;
    }
    ...

The new code at #1 implements blocking until the condition is met. This is a pattern when using condition variables: the condition variable's wait() function is almost always called in a while loop, and the loop tests the condition in which the function must block.

On the other hand, notify_all() should be called whenever some changes we made might turn the unblocking condition true. In our scenario, this means we must call notify_all() in the read() method, which takes characters out of the buffer and can potentially unblock the writer, as shown in the inserted code #2 below:

ssize_t bbuffer::read(char* buf, size_t sz) {
    std::unique_lock guard(this->mutex_);
    ...
    while (pos < sz && this->blen_ > 0) {
        buf[pos] = this->bbuf_[this->bpos_];
        this->bpos_ = (this->bpos_ + 1) % bcapacity;
        --this->blen_;
        ++pos;
    }
    if (pos > 0) {                   // #2
        this->nonfull_.notify_all();
    }

With condition variables, our bounded buffer program runs significantly more efficiently: instead of making millions of calls to read and write, it now makes about a 100k read calls and about 1M write calls, since the threads are blocked while the buffer is full (writes) or empty (reads).

Why the `while` loop around `cv.wait()`?

Why is it necessary to have wait() in a while loop?

wait() is almost always used in a loop because of what we call spurious wakeups. Since notify_all() wakes up all threads blocking on a certain wait() call, by the time when a particular blocking thread locks the mutex and gets to run, it's possible that some other blocking thread has already unblocked, made some progress, and changed the unblocking condition back to false. For this reason, a "woken-up" must revalidate the unblocking condition before proceeding further, and if the unblocking condition is not met it must go back to blocking. The while loop achieves exactly this.

Distributed Systems

Infrastructure at Scale

Modern web services can have millions of users, and the companies that operate them run serious distributed systems infrastructure to support these services. Below picture shows a simplified view of the way such infrastructure is typically structured.

End-users contact one of several datacenters, typically the one geographically closest to them. Inside that datacenter, their requests are initially terminated at a load-balancer (LB). This is a simple server that forwards requests onto different frontend servers (FE) that run an HTTPS server (Apache, nginx, etc.) and the application logic (e.g., code to generate a Twitter timeline, or a Facebook profile page).

The front-end servers are stateless, and they contact backend servers for information required to dynamically generate web page data to return to the end-users. Depending on the consistency requirements for this data, the front-end server may either talk directly to a strongly-consistent database, or first check for the data on servers in a cache tier, which store refined copies of the database contents in an in-memory key-value store to speed up access to them. If the data is in the cache, the front-end server reads it from there and continues; if it is not in the cache, the front-end server queries the database.

But the servers in the cache tier don't individually have space to hold the data of all users, and doing so would be very wasteful. So, instead, we need a way of dividing the cached data across servers in a systematic way. We could, for example, divide the data by username, or by the first letter of the username. Or we could split it by a hash of the username, or put data from all users from the same geographic location on the same server. All of these are reasonable ideas, and realize a concept called sharding: splitting state across multiple computers, which each individually hold a piece (a "shard") of the overall system state.

Summary

Condition variables are another type of synchronization object that make it possible to implement blocking of threads until a condition is satisfied (e.g., there is space in a bounded buffer again). This improves efficiency of the bounded buffer program, as threads no longer spin; for some other programs that require threads to wait, condition variables are actually required for correctness. To wait, a thread calls wait() on the condition variable while holding a mutex lock. If the condition does not hold, the mutex is released and the thread is blocked. Any waiting threads for a condition variable are unblocked by a call to notify_all (or notify_one) from another thread.

We finished off our discussion of thread synchronization by considering two details of the conditional variable API: why wait() needs to atomically release the lock and block the calling thread, and why we need to wrap calls to wait() in a while loop. For both of these choices, it turns out that a condition variable without them allows for subtly incorrect executions when multiple threads interleave in a pessimal way, and we saw examples of this.

Finally, we started talking about distributed systems and how computers can work together in order to scale to thousands of users. Sharding is one of the key concepts in scalable distributed systems, and we'll talk more about it shortly (and you'll implement it in Project 5A!).