Lecture 24: Bounded Buffer #
Example program: Bounded Buffer #
A bounded buffer is a common abstraction in computer systems: memory is finite, so we need to cap the amount of space we use for our buffers. However, we may not know in advance how much data we need to store in the buffer, so we may need to allow some data to be consumed out of the buffer before we can insert more.
This idea is efficiently implemented using a cyclic buffer, in which writes (and reads) wrap around the end to continue at the beginning. A cyclic buffer is efficient because it never requires any data (bytes) to be moved within the buffer unless those bytes are read or written.
In the example above, the cyclic buffer contains the data !Hello wor
.
The bounded buffer we look at here is a synchronized object that supports the following operations:
read(buf, n)
: reads up ton
chars from the bounded buffer tobuf
;write(buf, n)
: writes up ton
chars into the bounded buffer frombuf
.
Bounded buffers are the abstraction used to implement pipes in the kernel. In our example, though, we're looking at a bounded buffer in user-space and between threads (you can think of Go's channels as such bounded buffers, for example).
In the following, we will use a bounded buffer that is accessed by multiple threads as an example for a data structure that requires synchronization. In the examples, we use two threads: one writer thread and one reader thread. The writer (also called "producer") writes data into the bounded buffer, and the reader (also called "consumer") reads data from it.
The buffer can hold up to CAP
characters, where CAP
is the capacity
of the buffer. The API of the bounded buffer is specified as follows:
read(buf, sz)
:- Reads up to
sz
characters from the bounded buffer intobuf
- Removes characters read from the bounded buffer
- If the bounded buffer is empty, it should block until it becomes nonempty or closed
- Returns the number of characters read
- Reads up to
write(buf, sz)
:- Writes up to
sz
characters, frombuf
to the end of bounded buffer - Writes up to bounded buffer capacity
CAP
characters - If bounded buffer becomes full, it should block until buffer becomes nonfull
- Returns the number of characters written
- Writes up to
Example: Assuming we have a bounded buffer object bbuf
with CAP=4
.
bbuf.write("ABCDE", 5); // returns 4
bbuf.read(buf, 3); // returns 3 ("ABC")
bbuf.read(buf, 3); // returns 1 ("D")
bbuf.read(buf, 3); // blocks
The bounded buffer preserves two important properties:
- Every character written can be read exactly once
- The order in which characters are read is the same order in which they are written
Each bounded buffer operation should also be atomic, just like
read()
and write()
system calls.
Unsynchronized buffer #
Let's first look at a bounded buffer implementation bbuffer-basic.cc
,
which does not perform any synchronization.
struct bbuffer {
static constexpr size_t bcapacity = 128;
char bbuf_[bcapacity];
size_t bpos_ = 0;
size_t blen_ = 0;
bool write_closed_ = false;
ssize_t read(char* buf, size_t sz);
ssize_t write(const char* buf, size_t sz);
void shutdown_write();
};
ssize_t bbuffer::write(const char* buf, size_t sz) {
assert(!this->write_closed_);
size_t pos = 0;
while (pos < sz && this->blen_ < bcapacity) {
size_t bindex = (this->bpos_ + this->blen_) % bcapacity;
size_t bspace = std::min(bcapacity - bindex, bcapacity - this->blen_);
size_t n = std::min(sz - pos, bspace);
memcpy(&this->bbuf_[bindex], &buf[pos], n);
this->blen_ += n;
pos += n;
}
if (pos == 0 && sz > 0) {
return -1; // try again
} else {
return pos;
}
}
ssize_t bbuffer::read(char* buf, size_t sz) {
size_t pos = 0;
while (pos < sz && this->blen_ > 0) {
size_t bspace = std::min(this->blen_, bcapacity - this->bpos_);
size_t n = std::min(sz - pos, bspace);
memcpy(&buf[pos], &this->bbuf_[this->bpos_], n);
this->bpos_ = (this->bpos_ + n) % bcapacity;
this->blen_ -= n;
pos += n;
}
if (pos == 0 && sz > 0 && !this->write_closed_) {
return -1; // try again
} else {
return pos;
}
}
This implements a circular buffer, because reads and writes logically
wrap around at the end of the buffer. Every time we read a character out
of the buffer, we increment bpos_
, which is the index of the next
character to be read in the buffer. Whenever we write to the buffer, we
increment blen_
, which is the number of bytes currently stored in the
buffer. To make the buffer circular, we perform all index arithmetic
modulo the total capacity of the buffer.
When there is just one thread accessing the buffer, it works perfectly
fine. But does it work when multiple threads are using the buffer at the
same time? In our test program in bbuffer-basic.cc
, we have one reader
thread reading from the buffer and a second writer thread writing to the
buffer. We would expect everything written to the buffer by the writer
thread to show up exactly as it was written once read out by the reader
thread.
In particular, the example writes the string Hello world!
one million
times over, so we would expect to read one million strings, composed of
13 million total characters.
When we try this, it does not work! The reason is that there is no
synchronization over the internal state of the bounded buffer.
bbuffer::read()
and bbuffer::write()
both modify internal state of
the bbuffer
object (most critically bpos_
and blen_
), and such
accesses require synchronization to work correctly in a multi-threaded
environment – recall the fundamental rule of sychronization, which says
that if state is accessed by multiple threads and at least one thread
may write to the state, synchronization is required.
One way to fix the bounded buffer is to turn the function bodies of the
read()
and write()
methods into critical sections using a mutex.
Correctly Synchronized buffer #
To figure out what state we need to protect through synchronization, let's look at the definition of the bounded buffer:
struct bbuffer {
static constexpr size_t bcapacity = 128;
char bbuf_[bcapacity];
size_t bpos_ = 0;
size_t blen_ = 0;
bool write_closed_ = false;
...
};
Recall the basic rule of synchronization from Lecture 21: if two or more threads can concurrently access an object, and at least one of the accesses is a write, a race condition can occur and synchronization is required.
The bounded buffer's internal state, bbuf_
, bpos_
, blen_
, and
write_closed_
are both modified and read by read()
and write()
methods. Local variables defined within these methods are not shared. We
need to synchronize on shared variables (internal state of the buffer),
but not on local variables.
A correct version of a synchronized bounded buffer via a mutex is in
bbuffer-mutex.cc
. Key differences from the unsynchronized version are
highlighted below:
struct bbuffer {
...
std::mutex mutex_;
...
};
ssize_t bbuffer::write(const char* buf, size_t sz) {
this->mutex_.lock();
...
this->mutex_.unlock();
if (pos == 0 && sz > 0) {
return -1; // try again
} else {
return pos;
}
}
ssize_t bbuffer::read(char* buf, size_t sz) {
this->mutex_.lock();
...
if (pos == 0 && sz > 0 && !this->write_closed_) {
this->mutex_.unlock();
return -1; // try again
} else {
this->mutex_.unlock();
return pos;
}
}
This correctly implements a synchronized bounded buffer. Simply wrapping accesses to shared state within critical sections using a mutex is the easiest and probably also the most common way to make complex in-memory objects synchronized (or "thread-safe").
Using a mutex associated with the bounded buffer object allows threads to operate on different bounded buffers in parallel (but not on the same bounded buffer). Using a single, global mutex, by contrast, would synchronize all threads operating on any bounded buffer in the program (there could be many!), an example of extremely coarse-grained synchronization Coarse-grained synchronization is correct, but it also limits concurrency.
The bounded_buffer::write()
method in bbuffer-mutex.cc
is
implemented with mutex synchronization. Note that we added a definition
of a mutex to the bbuffer
struct definition, and we are only accessing
the internal state of the buffer within the region between
this->mutex_.lock()
and this->mutex_.unlock()
, which the the time
period when the thread locks the mutex.
How do we identify critical sections? #
One question you may wonder about is how we figured out where to put the
lock()
and unlock()
calls for the mutex, and even what mutex we
should use in the first place. For the latter question, the answer is
that the association between mutex and shared state is purely in the
developer's (your!) head.
Association between mutexes and the state they protect
The association between the mutex and the state it protects is rather arbitrary. These mutexes are also called "advisory locks", as their association with the state they protect are not enforced in any way by the compiler, and must be taken care of by the programmer. Their effectiveness solely relies on the program following protocols associating the mutex with the protected state. In other words, if a mutex is not used correctly, there is no guarantee that the underlying state is being properly protected.
The general rule for where to put the boundaries of critical sections in
the code (the lock()
and unlock()
calls is that they must be placed
such that all accesses to shared state are inside the critical section.
In the following, we will consider a few locking strategies that violate
this rule in subtle ways.
Consider the following code, which move the mutex lock()
/unlock()
pair to inside the while
loop. We still have just one lock()
and one
unlock()
in our code. Is it correct?
ssize_t bbuffer::write(const char* buf, size_t sz) {
assert(!this->write_closed_);
size_t pos = 0;
while (pos < sz && this->blen_ < bcapacity) {
this->mutex_.lock();
size_t bindex = (this->bpos_ + this->blen_) % bcapacity;
this->bbuf_[bindex] = buf[pos];
++this->blen_;
++pos;
this->mutex_.unlock();
}
...
}
The code is incorrect because this->blen_
is not protected by the
mutex, but it should be.
What about the following code -- is it correct?
ssize_t bbuffer::write(const char* buf, size_t sz) {
this->mutex_.lock();
assert(!this->write_closed_);
size_t pos = 0;
while (pos < sz && this->blen_ < bcapacity) {
this->mutex_.lock();
size_t bindex = (this->bpos_ + this->blen_) % bcapacity;
this->bbuf_[bindex] = buf[pos];
++this->blen_;
++pos;
this->mutex_.unlock();
}
...
}
It's also wrong! Upon entering the while
loop for the first time, the
mutex is already locked, and we are trying to lock it again. Trying to
lock a mutex multiple times in the same thread causes the second lock
attempt to block indefinitely.
So what if we do this:
ssize_t bbuffer::write(const char* buf, size_t sz) {
this->mutex_.lock();
assert(!this->write_closed_);
size_t pos = 0;
while (pos < sz && this->blen_ < bcapacity) {
this->mutex_.unlock();
this->mutex_.lock();
size_t bindex = (this->bpos_ + this->blen_) % bcapacity;
this->bbuf_[bindex] = buf[pos];
++this->blen_;
++pos;
this->mutex_.unlock();
this->mutex_.lock();
}
...
}
Now everything is protected, right? NO! This is also incorrect and in many ways much worse than the two previous cases.
Although this->blen_
is now seemingly protected by the mutex, it is
being protected within a different region from the region where the rest
of the buffer state (bbuf_
, bpos_
) is protected. Further more, the
mutex is unlocked at the end of every iteration of the while
loop.
This means that when two threads call the write()
method concurrently,
the lock can bounce between the two threads and the characters written
by the threads can be interleaved, violating the atomicity requirement
of the write()
method.
Back to our original plan! #
With the original placement of lock()
and unlock()
, wrapping the
whole function body, we have correctly synchronized the buffer. Running
bbuffer-mutex
provides correct output strings as well as producing the
right number of total output characters (checkable via
./bbuffer-mutex | wc -c
), 13,000,000.
The only part of the bounded buffer specification that we have not yet
implemented is the part that specifies that the buffer ought to block
for writing when full and for reading when empty. Instead, our bounded
buffer simply returns -1 in these situations. The caller needs to check
and retry, but this is very inefficient: threads might spin for a long
time, repeatedly calling read()
or write()
only to find that it
still returns -1. To solve this problem, we need another synchronization
object: a condition variable.
Summary #
Today, we looked into how to correctly synchronize a bounded buffer data structure with concurrent reader and writer threads. This required identifying the shared state that is not constant (read-only) associated with the bounded buffer, and making sure that each access to that state – both reads and writes – happens in a critical section, i.e., while a mutex is locked. This ensures that only one thread can execute this code at a time.