Project 5A: Concurrent Store 🗂🗃🗄

Due: Wednesday, May 3 at 6:00 PM EST

Introduction

A key-value store is a storage system in which data (a value) is addressed by a key; common examples are Python’s dictionaries or Java/C++ hash maps. Unlike file systems or other structures where objects are arranged in a hierarchical directory structure, keys map directly to their values. Key-value stores are popular for large-scale web applications because they are particularly amenable to performance and scalability optimizations, such as concurrent access/modification and sharding. Such optimizations are paramount to the success of Internet-scale applications like Facebook, AirBnB, and Twitter, which must handle thousands of connection requests and petabytes of data every day.

In this assignment, you will be exploring ways to implement and optimize a concurrent key-value store. Connections will be able to get, put, append, and delete key-value pairs, and dynamically add or remove servers to distribute load. This assignment will consist of two parts:

In this part, Concurrent Store, you will implement multiple synchronized data structures for a multithreaded backend server to use. You will first implement a simple synchronized key-value store, then a key-value store with fine-grained synchronization for better scalability. You will also modify a given queue data structure to support synchronized access, which will be used by the server to accept multiple connections concurrently.
In Part B, Distributed Store, you will distribute your store across multiple servers, turning it into a distributed system. Later lectures will cover this material!

Optional: Interested in real-world use cases of key-value stores and sharding?

If you’re curious how what you’re building relates to real-world industry use, take a look at these papers:

Scaling Memcache at Facebook by Nishtala et al., which describes how Facebook deploys the memcached key-value store at scale. The store you’re implementing in the first part of this assignment is somewhat similar to memcached.

Slicer: Auto-Sharding for Datacenter Applications by Adya et al., which describes Google’s auto-sharding system (called “Slicer”). The shardmaster you will implement in the second part of this assignment is similar to Google’s Slicer.

Other examples of key-value stores used in industry include Redis, RocksDB, and LevelDB.

Learning Objectives

This assignment will help you:

Understand the fundamentals of threading, specifically managing and synchronizing threads.
Think about problems introduced by concurrency, like deadlocks and race conditions on shared data structures.
Familiarize yourself with using synchronization primitives, namely mutexes and condition variables.

Assignment Installation

Ensure that your project repository has a handout remote. Type:

$ git remote show handout

If this reports an error, run:

$ git remote add handout https://github.com/csci0300/cs300-s23-projects.git

Then run:

$ git pull
$ git pull handout main

This will merge our Project 5 stencil code with your repository.

Important Note: This project requires a newer version of C++ (C++20) that the container’s compilers don’t completely support; you will thus need to install a newer C++ toolchain. Please run the following commands before proceeding:

$ sudo apt-get update
$ sudo apt-get install clang++-12
$ sudo apt-get install g++-10

Once you have a local working copy of the repository that is up to date with our stencils, you are good to proceed. You’ll be doing all your work for this project inside the kvstore directory in the working copy of your projects repository.

Infrastructure Help

Stencil and Initial State

The stencil includes implementations for the client and server, as well as some utility functions and classes you may find useful. We are providing you with a fully implemented client and server, so you are responsible for implementing the simple key-value store, the synchronized queue class, and the concurrent key-value store.

You may find it helpful to frequently refer back to this table in both parts of the assignment; the support code is substantial, but we will indicate both here and within the stencil which utilities may be useful in your implementation.

Directories	Summary
`build/`	See “Compile and Run”.
`tests/` and `test_utils/`	See “Testing”.
`client/` and `repl/`	Contains the clients and relevant support code for interacting with your `KvServer` and `Shardmaster`s. You should not modify these.
`cmd/`	Contains the runner code that starts the client, server, and shardmaster. You should not modify these.
`common/`	Contains the definition of the `Shard` struct, as well as some useful helpers for managing or operating on them. Additionally, contains miscellaneous utility functions, such as color printing and string hashing. You will need `utils.hpp` for the concurrent key-value store, and `shard.hpp` for Distributed Store. (You might also find `config.{hpp, cpp}` helpful for Distributed Store.)
`kvstore/`	Contains the definition of the abstract `KvStore` interface, as well as a simple synchronized key-value store implementation. Implement your simple and concurrent key-value store here.
`net/`	Contains the networking support code used by the clients and servers. You should not modify these.
`server/`	Contains the definition of the `KvServer` class, and supporting data structures. Implement your synchronized queue here. You will also modify your server to make it sharding-aware in Distributed Store.
`shardmaster/`	Contains the definition of the `Shardmaster` class, and support code to process persistent connections. Implement your shardmaster in Distributed Store here.

Compile and Run

To compile, cd into the build/ directory and run make. This will create two executables: simple_client andserver.

The server will not function properly until after you implement part 3!

To run the server, specify the port, and optionally the number of workers:

$ ./server <port> [n_workers]

To run the simple client, specify the server address:

$ ./simple_client <server hostname:port>

Example Usage

To connect multiple clients to a server, start by opening multiple terminals (or use tmux).

Terminal 1: runs server on port 5000 with the default number of workers:

$ ./server 5000

Your server will then be listening on <hostname>:5000; the server’s full address will be displayed in the terminal.

Terminal 2: runs simple_client to connect to the server, using the address displayed above (in this example, the hostname is 65a4af2c28ae):

$ ./simple_client 65a4af2c28ae:5000

Using the Client

Once connected to the server, you can use one or more clients to run standard key-value store operations, like get, put, append, and delete. You can also use two additional requests, multiget and multiput (for multiple key-value pairs). For specific usage instructions, type help in the command prompt.

To exit the client (and any other executable), press Ctrl-D (or Cmd-D); this sends End-Of-File (EOF) to the program, causing it to exit cleanly. If your program hangs, use Ctrl-C (or Cmd-C) to force-quit the program.

Testing

We strongly recommend you test your code often while implementing, by compiling and running a server with a few clients (as shown above).

To run tests, you must be in the build/ directory. Then, run either make check concurrent_store or ./run_tests concurrent_store from inside the build/ directory, which will run all of the Concurrent Store tests.

To run the tests for an individual step # in Concurrent Store, you can run either make check A# or ./run_tests A#. For example, to run the tests for step 1, you can run
$ make check A1
or
$ ./run_tests A1

The tests are in the tests/ directory, with each subdirectory corresponding to a part of the assignment. After implementing each step, the testing harness will run all tests related to that step. Specifically, the steps will run all tests in the following directories:

Step 1: kvstore_sequential_tests
Step 2: kvstore_parallel_tests
Step 3: queue_tests
Step 4: kvstore_sequential_tests
Step 5: kvstore_parallel_tests

To run an individual test, run that specific test’s executable. For instance, for test_put_get_delete, run in the build/ directory:

./test_put_get_delete

The correctness tests can also optionally specify which type of KVStore to run (SimpleKvStore or ConcurrentKvStore) by passing in either simple or concurrent as an argument:

./test_put_get_delete simple

The tests default to the ConcurrentKvStore.

If you want to write your own tests, add new files to the subdirectories in tests/. Be sure to take a look at the functions in test_utils/test_utils.hpp — they will make writing your own tests much easier.

Assignment

Background

Concurrency in C++

Concurrency is useful because it allows processes to do multiple things seemingly at the same time: for example, a process that is waiting for a slow operation (such as fetching data over the internet) can use the processor for something else until it completes. When a computer has multiple processors, concurrency also allows for parallelism, i.e., the computer actually does multiple things at the same time.

However, concurrency also introduces a new category of undefined behavior: race conditions. A primary goal for any concurrent program is to avoid undefined behavior due to race conditions between threads. A program that is free of such race conditions is known as thread-safe, which means any data shared between threads is protected by appropriate synchronization, in case threads try to access or modify it simultaneously. Some of the most common synchronization primitives are mutexes and condition variables, which you will be using in this assignment. Additionally, you will see atomic variables.

Optional: Refresher on synchronization primitives

Mutexes

A mutex provides mutual exclusion between threads. Specifically, the mutex ensures that no more than one thread can be active inside a critical section at a time. Before entering the critical section (which, for example, accesses a piece of shared data), threads attempt to lock the “associated” mutex, and unlock it once they are done. This guarantees that only the one thread that successfully locked the mutex can modify the data. Since a thread will hang until it can lock the mutex, deadlocks may occur if the mutex is never unlocked.

Condition Variables

Condition variables allow a thread to wait for a condition to be met. Changes in that condition will be caused by another thread, which is also responsible for notifying the threads that wait on the condition variable. Example conditions are “Is the queue still empty?” or “Has the number of clients exceeded a threshold?”. Instead of a thread continuously locking, checking, and unlocking over and over again, it can wait on a condition variable until another thread signals that it’s worth trying now.

For a thread to wait on a condition variable with the C++ std::condition_variable, the thread needs to first lock the associated mutex, and then pass that mutex to the condition variable. The thread will then be suspended and the mutex unlocked automatically, until the thread is woken up. When it is woken up because the wait condition is now met, the thread is guaranteed to have the mutex locked again.

Think about why this is necessary! If a thread is waiting for a condition, but the mutex stays locked while it is waiting, then nothing will be able to change the shared variable (e.g. a buffer or queue) and the condition will never be met.

Atomic Variables

In C++, we can use the std::atomic library to declare atomic variables. The compiler translates operations on these variables into atomic instructions, which always execute without racing with any other processor’s accesses.

A specific type of mutexes that might be helpful to you are “unique locks”. These are a specific type of lock that automatically unlock once the lock is out of scope.

Example of Unique Locks

The std::unique_lock automatically unlocks a mutex once it goes out of scope.

void foo() {
    /* On initialization, a unique lock `guard` is 
     * acquired on `mtx`.
     */
    std::unique_lock guard(mtx);

    /*
     * Do work here. You can modify shared data here!
     * 
     * On function exit (where `guard` goes out of scope), 
     * the unique lock is automatically released.
     */
}

Step 1: Single-Threaded Simple KV Store

In this section, you’ll implement a single-threaded simple key-value store. By “single-threaded,” we mean that only one thread can safely operate on the KVStore at a time. This means that for Step 1, you should not worry about concurrency. You may safely assume that only one thread will ever be operating on your KVStore at a time.

Take a look at how we define a SimpleKvStore (in kvstore/simple_kvstore.hpp):

class SimpleKvStore : public KvStore {
 public:
    SimpleKvStore() = default;
    ~SimpleKvStore() = default;

    bool Get(const GetRequest *req, GetResponse *res) override;
    bool Put(const PutRequest *req, PutResponse *) override;
    bool Append(const AppendRequest *req, AppendResponse *) override;
    bool Delete(const DeleteRequest *req, DeleteResponse *res) override;
    bool MultiGet(const MultiGetRequest *req, MultiGetResponse *res) override;
    bool MultiPut(const MultiPutRequest *req, MultiPutResponse *) override;

    std::vector<std::string> AllKeys() override;

 private:
    // Your internal key-value store implementation!
};

Task: Define your internal key-value store. What data structure might be useful here? You may find this reference page useful.

Now it’s time to implement each of these methods. First, look at how we define the Request and Response structs in net/server_commands.hpp. The Request object contains the information being requested. The Response object is where your methods should store the response to the request (if a response is needed). For the AllKeys operation, return a vector of the keys. For the other operations, return true if the operation is successful; otherwise, return false.

Give me an example!

A call to Get(const GetRequest *req, GetResponse *res) takes in a pointer to a GetRequest, which has a key field. Your method should get the value corresponding to key from your internal key-value store and store it in the value field of the GetResponse that res points to.

If a Response struct is empty, that means you don’t need to provide an explicit confirmation that you fulfilled the request. For example, for a call to Put(const PutRequest *req, PutResponse *res), you should insert the key and value from the PutRequest that req points to into your internal key-value store. Since PutResponse is an empty struct, you don’t need to do anything with it.

Your key-value store will support the following operations:

Function	Arguments	Summary
`Get`	`<key>`	Gets the value associated with `<key>` from the key-value store. Fails if it does not exist.
`Put`	`<key>`, `<value>`	Sets `<key>` to `<value>`; replaces existing values.
`Append`	`<key>`, `<value>`	Appends `<value>` to the current value associated with `<key>`. Creates a new key-value pair if it does not already exist.
`Delete`	`<key>`	Deletes `<key>` from the key-value store, returning the last saved value. Fails if it does not exist.
`MultiGet`	`<key_1> ... <key_n>`	Gets the values associated with keys `<key_1> ... <key_n>` from the key-value store. Fails if any key does not exist.
`MultiPut`	`<key_1> ... <key_n>, <value_1> ... <value_n>`	Sets `<key_1> ... <key_n>` to `<value_1> ... <value_n>`, respectively.
`AllKeys`	None	Returns all of the keys from the database.

Task: Implement the methods in kvstore/simple_kvstore.cpp.

When you are finished, you should pass all kvstore_sequential_tests. You can check that your implementation is correct by running in the build/ directory:

$ make check A1

If you’re failing an individual test, you can run it directly, via

$ ./test_name simple

Tests default to using the ConcurrentKvStore (which you’ll implement in Steps 4 and 5), so you’ll need to explicitly tell each test to use the SimpleKvStore by passing in simple as an argument.

Step 2: Concurrent Simple KV Store

We would like our server to be able to handle multiple requests in parallel. However, our simple key-value store can’t safely handle concurrent requests yet! In this step, you’ll now make your simple key-value store thread-safe.

Implementation

Task: Add synchronization primitive(s) to SimpleKvStore in kvstore/simple_kvstore.hpp to make it thread-safe. You should use at most one mutex in this part. (We will explore more efficient locking schemes in later steps.)

Task: Modify the methods in kvstore/simple_kvstore.cpp to enforce synchronized access, using the primitive(s) you defined in the previous task.

When you are finished, you should pass all kvstore_parallel_tests. You can check that your implementation is correct by running in the build/ directory:

$ make check A2

Hints

The key aspect of this key-value store is that any number of threads can be calling the same method at the same time. Always think: “What’s the worst possible sequence of events if multiple threads are running this code?” Keep this idea front and center when implementing the SimpleKvStore API functions here, and throughout the rest of the project.
You want to make sure that the MultiGet, MultiPut, and AllKeys operations are atomic. Much like atomic instructions, an atomic operation should be executed as a single, indivisible unit, without interference or interleaving from other threads. That is, once started, an atomic operation should not be impacted by any other operations that may be executing simultaneously.
Atomic operations are often used in concurrent programming to ensure the integrity of shared data. In this context of your simple key-value store, other operations should not modify the store while the MultiGet, MultiPut, and AllKeys operations are running.

Step 3: Synchronized Queue

Now, your simple key-value store can safely handle concurrent requests. However, at the moment the server can’t handle concurrent requests yet! In this part, you’ll implement a synchronized queue, which the server will use to concurrently handle multiple clients and send their requests to your simple key-value store.

Your server will utilize a generic synchronized queue. The synchronized queue will allow multiple threads to add and remove elements safely, and wait for new elements to arrive. It is also handy to distribute requests from different clients across multiple threads!

Context

Synchronized?

A significant part of concurrency lies in making sure that multiple threads can safely access the same data structure concurrently. One common approach is to create a mutex for the data structure, lock it whenever it is accessed, and unlock it afterward. While effective, having to lock and unlock each time can clutter your code, and forgetting to do so will end up causing problems.

A synchronized data structure (or thread-safe data structure) is a data structure built specifically to be accessed concurrently by multiple threads, and does not need to be protected by an external mutex.

Generic?

You may have noticed the use of template <typename T> before class or function declarations for the queue. Its purpose is to allow you to define a class or struct that works with some arbitrary type, T, where T is defined the moment when you instantiate the class. This allows us to make a synchronized_queue of ints, chars, longs, strings, or any other struct! (This is how vectors and maps support arbitrary types.) We call a class or struct like this a generic, because it is not bound to a specific type.

Templates is just the way C++ does generics, and they can get very… involved. Don’t worry about it! You only need to know what we have already described.

More On C++ Templates

How does the server use the synchronized queue?

Take a look at server/server.cpp. In start(), we run this line:


this->client_listener = std::thread(&KvServer::accept_clients_loop, this);

which spawns a thread to run accept_clients_loop. While the server isn’t stopped, accept_clients_loop accepts incoming clients and adds them to this->conn_queue, which is a synchronized queue.

Going back to start(), the server then spawns “worker threads,” which each run work_loop:





// Initialize worker threads
this->workers.resize(this->n_workers);
for (auto &&worker : this->workers) {
    worker = std::thread(&KvServer::work_loop, this);
}

work_loop pops a client off this->conn_queue, retrieves all the client’s requests, then passes each to process_request and sends the corresponding response back to the client. process_request calls the method in your KVStore corresponding to the request type (e.g., for a GetRequest get_req, it calls this->store.Get(get_req, &get_res).

In summary, the server uses the synchronized queue to hold clients. It pushes new clients onto the queue in accept_clients_loop, and pops clients off the queue in work_loop to process their requests. To process the requests, the worker threads call process_request, which in turn calls the KVStore methods you implemented in step 1. With a synchronized queue and multiple worker threads, there can be multiple requests made to your KVStore concurrently.

Queue Layout

This is the declaration of our synchronized_queue (in server/synchronized_queue.hpp):

template <typename T>
class synchronized_queue {
public:
    size_t size();
    bool pop(T *elt);
    void push(T elt);
    std::vector<T> flush();
    void stop();
private:
    std::queue<T> q;
    // Synchronization primitives
    std::mutex mtx;
    std::condition_variable cv;
    std::atomic_bool is_stopped{false};
};

Tip

Wherever you see T, mentally replace it with int, char, or your favorite struct. Once you instantiate the queue (say, as a synchronized_queue<int>), the compiler actually turns those Ts into ints! Doing this mental replacement might make the code easier to understand.

Here’s a link on how this actually occurs, if you want to go down the rabbithole.

As you can see, our synchronized_queue is nothing more than an underlying queue. Your task is to implement these methods and use the given synchronization primitives to make them thread-safe.

Implementation

Task: In server/synchronized_queue.cpp, implement the function bodies for pop, push, flush, stop, and size, and use the provided synchronization primitive(s) to support synchronized access. Be sure to review the function headers in server/synchronized_queue.hpp.

You will find the std::queue documentation very helpful!

When you are finished, you should pass all queue_tests. You can check that your implementation is correct by running in the build/ directory:

$ make check A3

Step 4: Single-Threaded Bucket-Based KV Store

Overview

Operations on your simple key-value store are thread-safe and concurrent, but there is no actual parallelism. Since two threads cannot modify the store at the same time, it is thread-safe. However, the entire map gets locked every time a thread wants to perform an operation on the store, which means that there are no parallel (i.e. simultaneous) operations.

This single mutex protection strategy is an example of coarse-grained locking: one lock protects access to a large section of code or an entire data structure (in this case, a key-value store). Coarse-grained locks are easier to implement and reason about, but also lead to significant lock contention (when a thread tries to acquire a lock already owned by another thread) and impose a limit on system concurrency. On the other hand, fine-grained locking attempts to reduce lock contention by splitting the data structure or code into smaller, potentially disjoint pieces that each have their own lock. Multiple threads can then operate on separate pieces in parallel. Fine-grained locks benefit from the idea that threads working on disjoint pieces do not need to exclude each other from the entire object in order to maintain thread safety; for instance, in a binary search tree, operations on the left subtree do not necessarily affect operations on the right subtree.

Fine-grained locking considerations

Not all data structures are particularly amenable to fine-grained locking. For example, finding a smaller lock granularity in Step 3’s synchronized_queue that maintains thread-safety and correctness while improving performance is significantly more difficult (but still possible!) than, say, a binary search tree.

Finer-grained locking schemes also introduce system complexity and additional overhead from maintaining more locks; as a result, fine-grained locking is not always more performant, and premature optimization through fine-grained locks can introduce nasty concurrency bugs. As an example, an extreme implementation of fine-grained locking within a key-value store might associate a lock with each key-value pair. It might be informative to consider why associating a lock with each key-value pair is both inefficient and can introduce subtle concurrency bugs, depending on the underlying store implementation!

When considering lock granularity (both in this assignment and in future programs), it’s important to find a suitable granularity that balances performance and system complexity.

Example of Coarse vs. Fine-grained locking

A data structure you could implement coarse or fine-grained locking would be an array.

Coarse-grained locking would imply locking the entire array when any of it needs to be modified. This means that even if only a small part of the array will be modified, no other threads can access any part of the array.

This would look something like having an array-specific mutex.











// This may appear somewhat tedious
my_array_mtx.lock();
my_array[0] = 123;
my_array_mtx.unlock();
    
// But this locking is a lot easier on the computer
my_array_mtx.lock();
for(int i = 0; i < sizeof(my_array) / sizeof(int); i++){
   my_array[i] = i + 3;                                
}
my_array_mtx.unlock();

Alternatively, you could have fine-grained locking. This would involve ensuring that, instead of the entire array being unavailable, only the specific components changed will be unavailable to other threads. Thus, every “slot” in the array would need a corresponding mutex. While this allows for greater flexibility for threads, this structure may be a lot heavier on the system (because one would need to create more mutexes).

This would look something like:











// This is a lot simpler than before
my_array_mtx_array[0].lock();
my_array[0] = 123;
my_array_mtx_array[0].unlock();
    
// But this is a lot more tedious
for(int i = 0; i < sizeof(my_array) / sizeof(int); i++){
    my_array_mtx_array[i].lock();
    my_array[i] = i + 3;  
    my_array_mtx_array[i].unlock();
}

Notice in the above example, the scalability isn’t too great either – this is because we chose a too-fine grain 🌾 Thus, it’s always important to find a balance as mentioned above.

Another very cool example of fine and coarse-grain locking is locking a linked list! Feel free to examine this link for more information.

Implementation

If the single mutex locking strategy from Step 2 is too coarse-grained, and one lock for each key-value pair is too fine-grained, how can we design an efficient and scalable KVStore? We need to revisit our internal store design to support a new locking scheme–one where there’s more than one lock, but fewer locks than the number of keys.

For this part, you will implement the key-value store API as a bucket-based hash table.

Take a look at how we define a ConcurrentKvStore (in kvstore/concurrent_kvstore.hpp):



















class ConcurrentKvStore : public KvStore {
 public:
    ConcurrentKvStore() = default;
    ~ConcurrentKvStore() = default;

    bool Get(const GetRequest *req, GetResponse *res) override;
    bool Put(const PutRequest *req, PutResponse *res) override;
    bool Append(const AppendRequest *req, AppendResponse *res) override;
    bool Delete(const DeleteRequest *req, DeleteResponse *res) override;
    bool MultiGet(const MultiGetRequest *req, MultiGetResponse *res) override;
    bool MultiPut(const MultiPutRequest *req, MultiPutResponse *res) override;

    std::vector<std::string> AllKeys() override;

 private:
    // Your internal key-value store implementation!
    DbMap store;
};

Notice that your concurrent key-value store will support the same operations as the SimpleKvStore, so refer to Step 1 for descriptions of each method. Your internal store is defined as type DbMap. We have already implemented the DbMap struct for you. Your responsibility is to write the methods in kvstore/concurrent_kvstore.cpp.

For this part, you may assume that only one thread will operate on your bucket-based KVStore at a time. We will add support for synchronized access in Step 5.

You are required to keep your internal store as type DbMap, and to maintain the internal structure of the DbMap (i.e., no changing this line):

std::array<std::list<DbItem>, BUCKET_COUNT> buckets;

However, you are not required to use getIfExists, insertItem, or removeItem if you don’t want to. We provide them because we think they will be helpful helper methods for you to work with your internal store, but if you’d rather write different ones (or do all of your work in kvstore/concurrent_kvstore.cpp), feel free.

Task: Use the given DbMap implementation to fill in the function bodies for Get, Put, Append, Delete, MultiGet, MultiPut, and AllKeys in kvstore/concurrent_kvstore.cpp.

When you are finished, you should pass all kvstore_sequential_tests. You can check that your implementation is correct by running in the build/ directory:

make check A4

If you’re failing an specific test, you can run it individually:

./test_name

For Steps 4 and 5, you can either specify concurrent as an argument or just leave it blank (as in the example above).

Step 5: Concurrent Bucket-Based KV Store

Overview

In this part, you’ll modify your bucket-based key-value store implementation to support both safe and concurrent access.

Readers-writer locks

Recall that a data race can occur when a piece of shared data is not correctly protected. However, the key issue behind data races is mutability. On concurrent access and modification, instruction interleaving may occur and memory can become inconsistent; this is not an issue when different threads are only reading a piece of shared data.

As a result, one potential optimization is a readers-writer lock, or RWLock (std::shared_mutex in C++). RWLocks allow for concurrent access for read-only operations, but exclusive access for write operations. Before entering a critical section, threads may attempt to acquire either a shared (or read) lock or an exclusive (or write) lock; multiple threads may hold a shared lock at once, but like regular mutexes, only one thread may hold an exclusive lock. To ensure safe concurrent or exclusive access, reader threads cannot acquire a shared lock until there is no writer thread holding an exclusive lock.

As with mutexes, it is up to the programmer to properly protect shared data and lock critical sections within the program. RWLocks present an additional challenge over mutexes: programmers must also ensure that threads acquire the correct type of lock before entering a critical section or potentially modifying data. For example, acquiring a shared lock and then modifying shared data is undefined behavior. As you implement key-value store functions, keep in mind which functions are amenable to concurrent access, and which require exclusive access.

Example of RWLocks

In addition to the lock and unlock functions, the std::shared_mutex class provides two additional methods, lock_shared and unlock_shared:

std::shared_lock shared_mtx;
shared_mtx.lock_shared(); // acquire/release a shared lock
/*
 * Do work here. Do not modify any shared data in this critical section!  
 */
shared_mtx.unlock_shared();

shared_mtx.lock(); // acquire/release an exclusive lock
/*
 * Do work here. You can access and modify shared data here.  
 */
shared_mtx.unlock();

In addition, like regular mutexes, C++ provides a wrapper class, std::shared_lock, that automatically unlocks the shared mutex once it goes out of scope. This has the same API as std::unique_lock:

void foo() {
    // On initialization, a shared lock is acquired on `shared_mtx`.
    std::shared_lock guard(shared_mtx);

    /*
     * Do work here. Do not modify any shared data here!
     * 
     * On function exit (where `guard` goes out of scope), the shared lock is automatically released.
     */
}

Optional: Why not just use RWLocks all the time?

Especially in applications that experience read-heavy workloads, RWLocks permit more concurrency over regular mutexes. However, RWLocks should not be used as a drop-in replacement for mutexes: RWLocks come with additional overhead over simple mutexes, as they must contain additional state and logic to handle readers and writers. RWLocks also must avoid reader or writer starvation, where one operation is disproportionately favored. Wikipedia’s article provides an introduction to potential RWLock implementations and problems.

Task: Add synchronization primitives to DbMap to synchronize access to your buckets. We’d recommend using RWLocks!

You must use more than one mutex to guard your store.

Hints

Think about a more fine-grained approach that makes concurrent access to your store more efficient! See Part 4 for an explanation of coarse-grained vs. fine-grained locking.

Task: Modify the methods in kvstore/concurrent_kvstore.cpp to enforce synchronized access, using the primitives you defined in the previous task.

When you are finished, you should pass all kvstore_parallel_tests. You can check that your implementation is correct by running in the build/ directory:

$ make check A5

Hints

You want to make sure that the MultiGet, MultiPut, and AllKeys operations are atomic. When implementing these methods, be mindful of your locking order. Specifically, you will want to enforce a hierarchical locking scheme and acquire all relevant locks before accessing or modifying the store. Consider why this is necessary, both to ensure atomic operations and to prevent deadlocks.

Much like atomic instructions, an atomic operation should be executed as a single, indivisible unit, without interference or interleaving from other threads. That is, once started, an atomic operation should not be impacted by any other operations that may be executing simultaneously.

Atomic operations are often used in concurrent programming to ensure the integrity of shared data. In this context of a key-value store, other operations should not modify relevant buckets while the MultiGet, MultiPut, and AllKeys operations are running.

At this point, your implementation should pass all of the Part A tests! You can test this by running in the build/ directory:

$ ./run_tests concurrent_store

$ make check concurrent_store

Debugging with Sanitizers and GDB

Debugging concurrent code can be trickier than debugging single-threaded code. As threads interact with the shared data structures, it can be difficult to track down data races and deadlocks. Luckily, GDB and thread sanitizers can be really helpful.

Thread Sanitizers

Google’s ThreadSanitizer for C and C++ can help you detect data races, deadlocks, and thread leaks. To run the code with thread sanitizers, compile it with:

make TSAN=1

When compiling tests, you will need to specify TSAN=1 too:

make check TSAN=1

./test.sh TSAN=1 <A1|A2|A3|A4|A5>

Example

Consider the following example:

int count = 0;

int main() {
    // Start child thread, which calls child()
    std::thread p1(child);
    std::thread p2(child);
    // Only parent executes here
    p1.join();
    p2.join();

    printf("count %d\n", count); 
}

int child() {
    // Child executes here
    while(count < 100) {
        std::this_thread::sleep_for(std::chrono::milliseconds(1));
        count += 1;
    }
    return 0;
}

When you run this code using a thread sanitizer, it will produce the following output:

Note: Open the image in a new tab for a larger image.

The red message indicates what error is occurring (in this case, a data race)
The blue messages indicate conflicting memory accesses with a stack trace. In our case, the conflict occurs on line 25 and line 27. It includes which thread is performing each access, and if the thread holds a mutex it will also indicate it.
Below these messages, the sanitizer might also indicate:
- specific mutexes, when they were aquired, which threads acquired them, and sometimes when they were created
- threads, and when they were created
Refer to this ThreadSanitizerReportFormat for more information on the report format.

GDB

Sometimes if your program hangs due to a deadlock, the thread sanitizer may not be able to return any reports. In these cases, gdb has some more helpful tools. Here are some useful gdb commands:

info threads: prints out existing threads, the Light Weight Processor (LWP) they’re running on, and the line of code they’re executing. The current thread will have an asterisk next to it.

3    Thread 0x7ffff3bff700 (LWP 9167) "example" __lll_lock_wait ()
    at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
4    Thread 0x7ffff33fe700 (LWP 9003) "example" __lll_lock_wait ()
    at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

thread <thread-id>: switches to thread <thread-id>

(gdb) thread 4
[Switching to thread 4 (Thread 0x7ffff33fe700 (LWP 9003))]
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135	../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S

p <mutex>: Prints the fields of a mutex. Helpful for determining the owner of the lock identified by its Light Weight Processor.
CTRL+Z: Stops the execution of the program. If your program is hanging likely due to a deadlock, this allows you to examine the current state of the program. For example, if one of your threads is blocked on a mutex, you can switch to that thread, see which thread owns the mutex, and then switch to that thread.
bt: Prints out a stacktrace. Often times, threads are blocked on a system call (locking a mutex, waiting on a condition variable, or reading/writing to a file descriptor). Backtrace is useful for discovering where in the codebase the blocking call was called.
f <frame-number>: Switches to the provided frame number. Allows you to switch to frames within files you’re familiar with and determine which function called blocking system calls.

Handing In and Grading

Handin Instructions

As before, you will hand in your code using Git. In the kvstore/ subdirectory of your project repository, you MUST fill in the text file called README.md.

By 6:00pm on Wednesday, May 3rd, you must have a commit with all steps completed.

Remind me again what the README should contain?

The README.md file will include the following:

Any design decisions you made and comments for graders, under “Design Overview”. If there’s nothing interesting to say, just list “None”.
Any collaborators and citations for help that you received, under “Collaborators”. CS 300 encourages collaboration, but we expect you to acknowledge help you received, as is standard academic practice.
Notes on approximately how long it took you to complete the project. We won’t use this for grading, but we collect the data to calibrate for future years.

Grading Breakdown

20% (20 points) for passing the sequential simple KVStore tests (A1, 2 pts per test).
10% (10 points) for passing the parallel simple KVStore tests (A2, 2 pts per test).
16% (16 points) for passing the queue tests (A3, 2 pts per test).
20% (20 points) for passing the sequential advanced KVStore tests (A4, 2 pts per test).
34% (34 points) for passing the concurrent advanced KVStore tests (A5, 6 pts per test and 4 pts for passing them all).

Now head to the grading server, make sure that you have the “KVStore” page configured correctly with your project repository, and check that your KVStore runs on the grading server as expected.

Congratulations, you’ve completed the first part of the fifth CS 300 project!

Acknowledgements: This project was developed for CS 300 by Richard Tang, Carolyn Zech, and Malte Schwarzkopf.