Lecture 24: Consistency, Real-World Distributed Systems

» Lecture video (Brown ID required)
» Post-Lecture Quiz (due 11:59pm Monday, May 4)

Transactions

In a distributed system, handling a client request sometimes requires operations to take place on multiple servers. As an example, consider posting to Facebook: you would rather like your new post to be replicated to multiple servers, so that it doesn't disappear if one of Facebook's servers goes offline for maintenance. Or consider a money transfer at your bank or on an application like Venmo, which needs to take money out of your account and deposit it into your friend's account – even if your and your friend's account balances are stored on different shards, and therefore on different servers.

Where is the client?

Classically, the client of a distributed system was the end-user device and the server was a single remote computer. But in today's applications, we have clients on end-user devices (e.g., a smartphone or laptop) and complex distributed systems infrastructure in a company's datacenters. In these settings, it is often the case that some "front-end" server in the datacenter acts on behalf of the end-user client to avoid sending many messages across the wide-area internet (which can take hundreds of milliseconds per message – an long time in computer terms!). This "proxy" client handles requests from the client device, and decomposes them into operations on different servers in the "back-end" of the web service. Towards these servers, the front-end server acts as a client.

Since the operations happen independently in the distributed system, it may be necessary to abort and undo (or "roll back") earlier operations if a later one fails. For example, consider a SET command for key k replicated over three servers:

  1. The client sends the SET command and new value for k to the first server.
  2. The first server applies the SET, changes the stored value for k, and acknowledges the success to the client.
  3. The client sends the SET command and new value for k to the second server.
  4. The second server fails and does not respond, or tells the client that it cannot apply the operation.
  5. The client detects this failure, and knows that it won't be able to update all servers.
  6. It is crucial that the SET on the first server gets undone at this point; otherwise, we would leave the system in an inconsistent state (namely, the replicas for k no longer agree).
(An alternative is for the client to keep retrying on the second server, hoping that it succeeds at some point. But in real distributed systems, we rarely have the time to keep retrying forever.)

The idea of a transaction (TX) captures that a distributed system either (a) processes a client request in its entirety without interference from other requests, or (b) the request fails and the system returns to the state prior to the request's arrival. A transaction wraps a set of separate operations to execute them in unison or not at all.

A transaction always has a defined beginning and end:

BEGIN TX {
   ---
    |
    | Operations (requests/RPCs to servers) contained in the TX
    |
   ---
} COMMIT/ABORT TX.
For example, a transaction that transfers $100 from A's account to X's account, whose balances are stored in a key-value store, may be written as follows:
BEGIN TX {
  a = GET balance_A
  x = GET balance_X
  if (a > 100 and x != ERROR) {
    SET balance_A = a - 100
    SET balance_X = x + 100
  } else {
    ABORT TX
  }
} COMMIT TX

If the transaction succeeds and all operations get completed, we say that the transaction "commits"; if it fails to complete one or more operations and undoes the others, we say the transaction "aborts" (fails).

In the money transfer example, this means that while the above transaction executes, it shouldn't be possible for other requests that modify account balances for A or X to succeed. Two examples are highlighted in red in the picture below; these are operations that another client may try to execute concurrently and which would mess up the correctness of our transaction above. (Consider what would happen if the SET A = A - 20 completed before our SET balance_A = a - 100 operation; or if DELETE X completed before the debit to X.)

If we turn these competing requests into their own transactions, we get:

// T2
BEGIN TX {
  a = GET balance_A
  if (a > 20) {
    SET balance_A = a - 20
  } else {
    ABORT TX
  }
} COMMIT TX
// T3
BEGIN TX {
  DELETE X
} COMMIT TX
When run correctly, the execution of these transactions is isolated. One way to achieve this isolation is for each transaction to take locks on all the objects accessed in the transaction, as highlighted in yellow in the next picture:

Since the locks ensure that transactions can only execute one after another, the order of execution now determines which transactions succeed and which fail. The picture shows several possible orders in green at the right-hand side: for example, an order of T1, T2, T3 results in T1 and T3 succeeding, but T2 fails because A has insufficient funds. Likewise, if T2 runs before T1, T1 fails for the same reason.

The ACID properties

We can now define a more formal set of properties that transactions ensure.

Transactions and the ACID properties are concepts. By just stating them, they don't become true in your system! Someone needs to implement mechanisms that ensure that the system maintains the ACID properties, as well as APIs for clients to start transactions and to attempt to commit them. Such mechanisms often include locking (to help with the ACI properties) and strategies for efficient writes to disk (to help with the D property).

Real-world Distributed Systems

Strong vs. Weak Consistency

Maintaining the ACID properties and offering a transactional abstraction guarantees correct results and is convenient for application developers. However, it is not a helpful strategy for high performance or scalability. Transactions offer strong consistency, but they do so at the cost of reducing effective concurrency in the system (think about all the communication and locks required!). Consequently, many systems relax some of the ACID properties in exchange for performance. These systems are called weak consistency systems; they tend to scale better, but you don't want to use them to handle crucial information like monetary balances or user account creation.

Many companies therefore run a mix of strongly consistent and weakly consistent systems. Here are some examples:

Strong Consistency Weak Consistency
MySQL (Facebook) memcached (Facebook, many others)
TAO (Facebook)
Spanner (Google) BigTable (Google)
Dynamo (Amazon)
NFS (CS department)
Blockchains

The links in the above table point to research papers about these systems and how the companies use them. If you're curious to learn more, take a look!

Infrastructure at Scale

Modern web services can have millions of users, and the companies that operate them run serious distributed systems infrastructure to support these services. Below picture shows a simplified view of the way such infrastructure is typically structured.

End-users contact one of several datacenters, typically the one geographically closest to them. Inside that datacenter, their requests are initially terminated at a load-balancer (LB). This is a simple server that forwards requests onto different frontend servers (FE) that run an HTTPS server (Apache, nginx, etc.) and the application logic (e.g., code to generate a Twitter timeline, or a Facebook profile page).

The front-end servers are stateless, and they contact backend servers for information required to dynamically generate web page data to return to the end-users. Depending on the consistency requirements for this data, the front-end server may either talk directly to a strongly-consistent database, or first check for the data on servers in a cache tier, which store refined copies of the database contents in an in-memory key-value store to speed up access to them. If the data is in the cache, the front-end server reads it from there and continues; if it is not in the cache, the front-end server queries the database.

Note that the database which is usually itself sharded and which acts as the source of ground-truth, is replicated across servers, often with a backup replica in another datacenter to protect against datacenter outages.

Finally, the preceeding infrastructure serves end-user requests directly and must produce responses quickly. This is called a service or interactive workload. Other computations in the datacenter are less time-critical, but may process data from many users. Such batch processing workloads include data science and analytics, training of machine learning models, backups, and other special-purpose tasks that run over large amounts of data. The systems executing these jobs typically split the input data into shards and have different servers work on distinct partitions of the input data in parallel. If the computation can be structured in such a way that minimal communication between shards is required, this approach scales very well.

Summary

Today, we further explored the complexities introduced by distributed systems and their use of replication as a fault-tolerance mechanism. In particular, we looked at the common situation where a logical client request is actually split into operations across multiple servers – either because updates need to be replicated across servers, or because the request requires operations on multiple shards (e.g., a money transfer).

Transactions are an abstraction that allows application programmers to specify sequences of operations that are executed together while ensuring that the ACID properties (atomicity, consistency, isolation, and durability) hold. Strong consistency in distributed storage systems typically requires transactions, and many databases implement support for transactions.

We also briefly looked at some examples of real-world distributed systems that exist at different ends of the strong vs. weak consistency spectrum, and we saw how both types of system are typically required in a web company. Finally, we looked at how several concepts from the course – sharding, replication, caching, and concurrency – come together in the infrastructure of a typical large web company.