« Back to the main CS 300 website

Project 5: Vunmo :moneybag:

Due April 7, 2021 at 6:00pm AoE


Introduction

In this assignment, you will be implementing the multi-threaded backend of a Venmo-like banking service, as well as a useful synchronized data structure. The banking service will follow a client-server model, where your server will be able to support multiple clients in parallel. Clients will be able to pay and charge other clients, as well as withdraw, deposit, and check their balance.

Learning Objectives


Installation

Ensure that your project repository has a handout remote. Type:

$ git remote show handout

If this reports an error, run:

$ git remote add handout https://github.com/csci0300/cs300-s21-projects.git

Then run:

$ git pull
$ git pull handout master

This will merge our Project 5 (Vunmo) stencil code with your repository.

Once you have a local working copy of the repository that is up to date with our stencils, you are good to proceed. You’ll be doing all your work for this project inside the vunmo directory in the working copy of your projects repository.

Infrastructure Help


Conceptual Questions

Instructions

General Questions

  1. Discuss the tradeoffs between spawning a thread per request versus using a worker pool consisting of a fixed number of threads. Consider how performance will vary as the number of users increases.
  2. Describe end-to-end what happens when a client types in the command to withdraw money. Take a look at the functions documented invunmo-server.hh, and explain which functions will get invoked. While we’re not looking for a description of implementation details, tell us how these functions work together to process client request(s).
  3. A popular online food ordering service, PackSnass, includes a feature which allows users to collect rewards and share them with their friends. The PackSnass infrastructure contains the user_t struct (to store user data) and send_rewards function (to transfer rewards) listed below:
    typedef struct user {
        int id;           // a unique user id
        int rewards;      // the number of rewards a user has
        std::mutex mtx;   // mutex to synchronize user data 
    } user_t;
    
    void send_rewards(user_t* sender, user_t* receiver){
        sender->mtx.lock();
        receiver->mtx.lock();
        sender->rewards--;
        receiver->rewards++;
        sender->mtx.unlock();
        receiver->mtx.unlock();
    }
    
    The PackSnass servers have multiple worker threads, so user requests can be executed in parallel. As such, what might go wrong if two users, Alice and Zachary simultaneously attempt to send each other rewards? How might you change send_rewards to fix this problem?

Responsible Computing Questions

Imagine that you work for a company that designs control systems for large-scale water treatment plants. A water treatment plant removes contaminants from water that humans will consume and factories will use. Representatives from the Providence city government are thinking about purchasing your system. However, the officials complain that your system has poor scalability when the number of contamination sensors to monitor gets large. You debug the scaling problem and learn that it is caused by the use of coarse-grained locking. A city official suggests that you modify the system to use fine-grained locking. However, the redesign would need to be finished quickly, or else you will lose the government contract.

  1. Explain the tradeoffs between fine-grained locking and coarse-grained locking, considering performance, development effort, and correctness considerations. (1 sentence)
  2. How would these differences affect residents of Providence? Describe one impact of coarse-grained locking, and one impact of fine-grained locking. These impacts could be positive or negative. (2–3 sentences)

This question was adapted from “Ethical Tradeoffs in System Design: Performance versus Correctness” by Embedded EthiCS @ Harvard, available under a Creative Commons Attribution 4.0 International License.


Stencil

The stencil includes implementations for the client and server, as well as some utility functions and classes you may find useful. We are providing you with a fully implemented client, so you are responsible for implementing key functions in the server, as well as the synchronized queue class.

Name Summary
synchronized_queue.cc Implements synchronized queue class.
vunmo-server.{hh, cc} Implements server class for Vunmo.
vunmo-client.{hh, cc} Implements client class for Vunmo.
server.cc Instantiates and runs the Vunmo server.
client.cc Instantiates and runs the Vunmo client.
common.hh Structs and enums used by both server and client.
networking_helpers.{hh, cc} Helper functions that deal with the networking.

Compile and Run

To compile, run make or make all in your project directory. This will create two executables: server and client.

To run the server, specify the port and optionally the number of workers:

$ ./server [port] <workers>

To run the client, specify the server’s host and port, and (optionally supply a client id).

$ ./client [host] [port] <id>
Example

To connect two clients to a server, start by opening three terminals (or use tmux).

Terminal 1: runs server with default number of workers

$ ./server 1234

Terminal 2: runs client with ID 17 and connects to server

$ ./client localhost 1234 17

Terminal 3: runs client with random ID and connects to server

$ ./client localhost 1234

Using the Client

Once connected to the server, you can use one or more clients to run any of the following commands:

Command Use
balance Check current balance
deposit [amount] Increases balance by amount
withdraw [amount] Decreases balance by amount
pay [client] [amount] Transfers amount to client
charge [client] [amount] Notifies client that you request amount

Whenever a client gets paid or receives a charge from another client, they will be displayed a notification. If it is a charge, they will be prompted for y/n, and will respond to the server accordingly.

Testing

While you write your code we strongly recommend you test it often by compiling and running a server with a few clients (as shown above). This will also give you a better understanding of what behaviour is expected. We are also providing compiled demos for the server and client so you can compare your own output with these (we won’t be testing your output directly, so it doesn’t have to be identical word for word).

Additional, you can run the test suite we provide with the run_tests script. This script will run tests inside the tests directory, which contains multiple categories of system and unit tests. (You can also invoke these tests via the familiar make check command.)

Basic Invocation:

$ ./run_tests [section]

Options:

Example

Runs queue section of tests and short circuits on first failure, with thread sanitizer enabled.

$ ./run_tests -s -z queue

Concurrency in C++

The main goal for any concurrent program is to avoid undefined behavior due to race conditions between threads. A program that is free of such race conditions is known as thread-safe, which means any data shared between threads is protected by appropriate synchronization in case threads try to access or modify it simultaneously. Some of the most common synchronization primitves are mutexes and condition variables, which you will be using in this project. Additionally, you will see atomic variables.

Synchronization Data Structures

Mutexes

A mutex provides mutual exclusion between threads. Specifically, the mutex ensures that no more than one thread can be active inside a critical section at a time. Before entering the critical section (which, e.g., accesses a piece of shared data), threads attempt to lock the “associated” mutex, and unlock it once they are done. This guarantees that only the one thread that successfully locked the mutex can modify the data. Since a thread will hang until it can lock the mutex, deadlocks may occur if the mutex is never unlocked.

Condition Variables

Condition variables allow a thread to wait for a condition to be met. Changes in that condition will be caused by another thread, which is also responsible for notifying the threads that wait on the condition variable. Example conditions are “Is the queue still empty?” or “Has the number of clients exceeded a threshold?”. Instead of a thread continuously locking, checking, and unlocking over and over again, it can wait on a condition variable until another thread signals that it’s worth trying now.

For a thread to wait on a condition variable with the C++ std::condition_variable, the thread needs to first lock the associated mutex, and then pass that mutex to the condition variable. The thread will then be suspended and the mutex unlocked automatically, until the thread is woken up. When it is woken up because the wait condition is now met, the thread is guaranteed to have the mutex locked again.

Think about why this is necessary! If a thread is waiting for a condition, but the mutex stays locked while it is waiting, then nothing will be able to change the shared variable (e.g., a buffer or queue) and the condition will never be met.

Example

Here is a basic example with two threads, involving cookies because I was hungry when I wrote this.

The Variables:

int num_cookies = 0;
std::mutex num_cookies_mtx;
std::condition_variable num_cookies_cv;
// this naming is just a convention

Thread 1:

void bake_cookie() {
    num_cookies_mtx.lock()           // always lock before access
    
    num_cookies++;
    
    if (num_cookies == 1) {           // if num_cookies was just 0...
        num_cookies_cv.notify_one();  // ...signal one waiting thread
    }
    
    num_cookies_mtx.unlock();        // always unlock when finished
}

Thread 2:

void i_am_hungry(){
    // create a lock to pass into the condition variable
    // (this locks the mutex)
    std::unique_lock<std::mutex> lock(num_cookies_mtx);
    
    while (num_cookies == 0) {        // wait until there's cookies
        num_cookies_cv.wait(lock);    // this unlocks the mutex
    }
    
    // when we emerge we know we have the mutex locked
    
    num_cookies--;
    
    // we don't have to unlock explicitly,
    // because when lock goes out of scope it unlocks the mutex
}

Thread 1 will check if there are cookies, and wait until there are. Thread 2 will add a cookie, and notify any waiting threads that it is time for them to wake up because the condition they were waiting for has been met.

One small caveat is that instead of checking in an if statement, it is necessary to check in a while statement. This is because due to an absolutely whack implementation of condition variables, waiting threads may spuriously wake up, even if they have not been signalled. There is a historical “reason” for this, but now we just have to live with it.

Atomic Variables

There are some times where a shared variable will just be read and checked, so every access would just be surrounded by mutexes. To make this faster and simpler, we can use an atomic variable instead. These are variables that do not require mutexes, because every interaction with it is guaranteed to be atomic, or “all in one, uninterrupted action”, implemented via atomic machine instructions. In this project, we will use atomic booleans to indicate whether things have stopped. Some important functions in std::atomic you will want to use are load and store.

Concurrency Trouble

When working with concurrency, debugging takes on a whole new meaning. Because the order of events is now not the same every time, you may occasionally see different behaviour when executing the same program. The two main things to look out for are data races and deadlocks. We strongly recommend you use the thread sanitizer to catch these bugs early. It will save you massive headaches, and will make debugging a lot easier.

Data Races

This is when a piece of shared data is not correctly protected, and two or more threads access or modify it. This may cause many different issues! For example, a thread might try to pull data off of an empty queue because it thought the queue wasn’t empty, or you might end up dereferencing an object pointer that was just freed by another thread. In some more bizarre cases, like when dealing with maps, this may sometimes cause infinite loops or other very unexpected behaviour, because of how these data structures are implemented.

Deadlocks

These occur when a thread locks a mutex and never unlocks it: all threads waiting for it to unlock will hang forever. For example, if two threads currently each have a mutex locked, and try to acquire a lock on each other’s mutexes, they will hang forever because no one will unlock those mutexes now. (Another example is a thread trying to lock a mutex twice. This happens surprisingly easily with chains of function calls!)

To avoid deadlocks, make sure you never forget to unlock a mutex, and have some consistent order in which threads acquire and release locks.

Synchronized Queue

In this project, you’ll implementing a generic synchronized queue, which is used both by the server and client implementations. The synchronized queue allows multiple threads to add and remove elements safely. It’s handy when multiple worker threads rush to handle requests from different clients!

Context

Synchronized?

A significant part of concurrency lies in making sure that multiple threads can safely access the same data structure concurrently. One common approach is to create a mutex for the data structure, lock it whenever it is accessed, and unlock it afterward. While effective, having to lock and unlock each time can clutter your code, and forgetting to do so will end up causing problems.

A synchronized data structure (or thread-safe data structure) is a data structure built specifically to be accessed concurrently by multiple threads, and does not need to be protected by an external mutex.

Generic? (C++ Templates)

If you have read through the stencil you may have noticed the use of template <typename T> before class or function declarations for the queue. Its purpose is to allow you to define a class or struct that works with some arbitrary type, T, where T is defined the moment when you instantiate the class. This allows us to make a synchronized_queue of ints, chars, longs, strings, or any other struct! (This is how vectors and maps support arbitrary types.) We call a class or struct like this a generic, because it is not bound to a specific type.

Templates is just the way C++ does generics, and they can get very… involved. Don’t worry about it! You only need to know what we have already described.

More On C++ Templates

Queue Layout

This is the declaration of our synchronized_queue:

template <typename T>
class synchronized_queue {
public:
    std::size_t size();
    bool pop(T *elt);
    void push(T elt);
    std::vector<T> flush();
    void stop();
private:
    std::queue<T> q;
    std::mutex mtx;
    std::condition_variable cv;
    std::atomic_bool is_stopped{false};
};
Tip

Wherever you see T, just mentally replace it with int, char, or your favorite struct. Once you instantiate the queue (say, as a synchronized_queue<int>), the compiler actually turns those T into int! Doing this mental replacement might make the code easier to understand.

Here is more information on templates if you want to go down the rabbithole.

As you can see, our synchronized_queue is nothing more than an underlying queue, a mutex to protect it, and a condition variable for threads to wait if the queue is empty (ignoring is_stopped).

In the project, you will implement the public methods of the synchronized queue.

Stopping

The is_stopped boolean exists in order to ensure all waiting threads are gone when a std::condition_variable is destroyed. When an instantiated synchronized_queue object goes out of scope (in this project, that is at the end of the program) its fields are automatically destroyed. There cannot be any threads waiting on a condition variable when it is destroyed; if there are, this will cause an error. This means that before the queue goes out of scope, you must signal all threads waiting on the condition variable, and prevent any new ones from waiting on it. This is where is_stopped comes into play: once this variable is true, no new threads should wait on the condition variable.

Note: You need to consider the is_stopped variable when implementing pop (and stop, of course).

Thread-Safety

The key aspect of this queue is that any numbers of threads can be calling the same method at the same time. Always think: “What’s the worst possible sequence of events if multiple threads are running this code?” Keep this idea front and center when implementing the queue’s functions (and throughout the rest of the project).

Tip

When writing your code, consider any two consecutive lines of code. After a thread has finished exeuction the first line but before it has started to execute the next, any number of events may have happened (performed by other threads). What could have changed? What guarantees do I have? What mutexes is the thread holding?

Implementation

Task: Implement the synchronized queue.

In synchronized_queue.cc, fill in the function bodies for pop, push, flush, and stop. When you are finished, check that your implementation is correct by running:

$ ./run_tests queue

Server

There are many ways to approach implementing a server: which one a developer chooses comes down to their choice on the tradeoffs between simplicity, efficiency, and scalability. The server you will implement for this project has been designed to be scalable and efficient (which means you’ll have fun implementing it).

Context

Efficiency

A simple single-threaded, single-client server could still work with multiple clients by handling them only one at a time. However, this would be very inefficient, especially if your computer has multiple processors. To take advantage of these processors, we need concurrency. With concurrency and the parallelism of multiple processors comes speed and efficiency, but also the need for synchronization when accessing shared data.

Scalability

One popular approach is to create a new thread every time a client connects, and have that thread be solely responsible for reading and processing the client’s request. While efficient, this approach does not work if your server expects a very large number of simultaneous clients, as the number of threads may grow very large. (Recall that each thread needs its own stack.)

A way to avoid creating a thread per client is to have a fixed number of worker threads and a single work queue. Each time a client sends a request, the request is added to the work queue and can be picked up by any of the worker threads. This approach can scale more easily because the number of threads is now independent of the number of clients. This is the approach we chose to use for this project.

Server Design

The server will consist of three main operational parts which will be initiated upon calling start:

  1. One thread to listen for and accept new client connections (accept_clients_loop).
  2. One thread to receive and read client requests, and put them on the work queue (receive_requests_loop).
  3. Several worker threads to process the requests on the queue, and respond to the clients who issued them (work_loop).

These three components will run concurrently and work together by sharing data like accepted clients and accounts. It is very important that you synchronize their access to this shared data using synchronization primitves.

Starting the Server: start

This is the first function called after a server is created, and the moment when the three operational parts mentioned above are started. (Since you haven’t implemented the loops yet, the threads will do nothing, which is ok for now!) Take a look at the comments in the stencil for details.

Stopping the Server: stop

So soon? It’s important to clean up everything you create, and starting with start and stop is a good exercise. stop is responsible for joining all un-detached threads and deallocating all memory that was allocated during the execution of your program.

You may notice step 3 in the stencil asks that you flush and delete all requests in the queue. Even though you haven’t put anything in the queue yet, you will in some of your other functions!

We have written some of stop for you, so you just need to complete it.

Task: Implement start and stop.

When you are finished, check that you pass test1_start_stop by running:

$ ./run_tests -s server_part1

Accepting Clients: accept_clients_loop

Clients talk to the server over the network, and specifically over a TCP connection. TCP connections (and network connections in general) are represented by file descriptors in Linux processes. In the project, a client connects to the server by opening a new connection, which causes the server to instantiate a client_conn_t to store the file descriptors associated with the connection. If it is a new client, the server also creates an account_t to hold their balance. Connections and accounts will be stored in the server’s client_conns and accounts maps respectively, where they are mapped to from the client’s id.

Both the accounts and client_conns maps have an associated mutex, and so do each client_conn_t and account_t, so make sure you are always synchronizing your access when using these. When a thread locks one of the maps, no other threads can access anything in it. For this reason, in order to preserve concurrency, a thread should only lock the maps for as long as it needs to insert, remove, or look something up in one of the maps. In addition, try to avoid holding any more mutexes than absolutely necessary at any given time.

Task: Implement accept_clients_loop.

When you are finished, check that you pass test2_accept_clients by running:

$ ./run_tests -s server_part1

Receiving Requests: receive_requests_loop

Once a client has connected, it may begin sending requests in the form of request_t structs. The fetch_requests function will check client connections’ file descriptors, read from them, and return the new requests in a vector. The receive_requests_loop is responsible for putting these requests into the work queue, so workers can start to process them.

Processing Requests: work_loop

Each worker thread is responsible for pulling requests off the queue (in work_loop) and then updating account balances and sending responses and notifications to the corresponding clients (in process_request).

Since each thread will be concurrently interacting with multiple accounts, you need to be mindful when accessing these accounts. To make this easier, we have provided functions stubs and descriptions for get_account and get_two_accounts, which you will be implementing and using.

Each request_t has an associated request_type, which can be one of BALANCE, DEPOSIT, WITHDRAWAL, PAYMENT, or CHARGE. We are providing the handlers to modify the balances accordingly, but you are responsible for calling the appropriate handler for each type, as well as correctly accessing the client’s accounts.

If a request is a PAYMENT or a CHARGE, the server needs to send a notification to the client getting paid or requested.

Note: All handlers, helpers, and documentation of functions you are implementing can be found in the vunmo-server.hh file.

Task: Implement work_loop, process_request, get_account, and get_two_accounts.

When you are finished, check that you pass test3_process_requests by running:

$ ./run_tests -s server_part1

Moving Forward: The server_part1 section tests the most basic integrity of your server, and not is really checking for concurrency. server_part2, by contrast, tests your concurrency. You can now see how your server handles a large number of clients all making many requests per second (these tests are in no particular order):

$ ./run_tests server_part2

While the basic_load_test should not take longer than 15 seconds, you can let it run for longer by specifying a larger test timeout with -t [timeout].

Extra Credit

Graduate students taking CSCI 1310 should complete one of the following questions.

Using concurrency for performance always requires reasoning about the tradeoff between the cost of synchronization and the performance benefit introduced by parallel execution.


Debugging with Sanitizers and GDB

Debugging concurrent code can be trickier than debugging single-threaded code. As threads interact with the shared data structures, it can be difficult to track down data races and deadlocks. Luckily, GDB and thread sanitizers can be really helpful.

Running a Single Test

To run single test, compile it and then run the executable it creates:

$ make <path-to-test>
$ ./<path-to-test>
Example
# to compile the serial_push_all_flush test 
$ make tests/queue/serial_push_all_flush

# to run the test 
$ ./tests/queue/serial_push_all_flush 
# to run the test in gdb
$ gdb tests/queue/serial_push_all_flush 

Thread Sanitizers

Google’s ThreadSanitizer for C and C++ can help you detect data races, deadlocks, and thread leaks. To run the code with thread sanitizers, compile it with:

make TSAN=1
ThreadSanitizer Example

Consider the following example:

int count = 0; int main() { // Start child thread, which calls child() std::thread p1(child); std::thread p2(child); // Only parent executes here p1.join(); p2.join(); printf("count %d\n", count); } int child() { // Child executes here while(count < 100) { std::this_thread::sleep_for(std::chrono::milliseconds(1)); count += 1; } return 0; }

When you run this code using a thread sanitizer, it will produce the following output:

Note: Open the image in a new tab for a larger image.

GDB

Sometimes if your program hangs due to a deadlock, the thread sanitizer may not be able to return any reports. In these cases, gdb has some more helpful tools. Here are some useful gdb commands:

3    Thread 0x7ffff3bff700 (LWP 9167) "example" __lll_lock_wait ()
    at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
4    Thread 0x7ffff33fe700 (LWP 9003) "example" __lll_lock_wait ()
    at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
(gdb) thread 4
[Switching to thread 4 (Thread 0x7ffff33fe700 (LWP 9003))]
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135	../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
$2 = {<std::__mutex_base> = {_M_mutex = pthread_mutex_t = {Type = Normal,
      Status = Acquired, possibly with waiters, Owner ID = 9167, Robust = No, Shared = No,
      Protocol = None}}, <No data fields>}

Handing In & Grading

Handin instructions

As before, you will hand in your code using Git. In the vunmo/ subdirectory of your project repository, you MUST fill in the text file called README.md.

Remind me again what the README.md should contain?
The README.md file will include the following:
  1. Any design decisions you made and comments for graders, under "Design Overview". If there's nothing interesting to say, just list "None".
  2. Any collaborators and citations for help that you received, under "Collaborators". CS 300 encourages collaboration, but we expect you to acknowledge help you received, as is standard academic practice.
  3. Your answers to the conceptual questions at the start of this handout under "Conceptual Questions".
  4. Notes on approximately how long it took you to complete the project. We won't use this for grading, but we collect the data to calibrate for future years.

Grading breakdown

Now head to the grading server, make sure that you have the “Vunmo” page configured correctly with your project repository, and check that your Vunmo tests pass on the grading server as expected.

Congratulations, you’ve completed the fifth CS 300 project! :money_with_wings:


Acknowledgements: The Vunmo project was developed for CS 300.