Lecture 26: Sharding, Networking
» Lecture video (Brown ID required)
» Post-Lecture Quiz (optional)
We are now moving on to the last block of the course, which covers distributed systems. The largest distributed system in existence is one we all use every day: the internet, a global networking between many computer systems. Moreover, distributed systems also often exist in the datacenters of large internet companies, and we'll see som examples from this domain.
Sharding
In some settings, a single server is not sufficient to handle all the requests that users generate. Such settings include popular web services like Google, Facebook, Twitter, or Airbnb, which all receive millions of user requests per second &nash; clearly more than a single server can handle. This requires scaling the service.
Vertical vs. Horizontal Scalability
When we talk about scalability, we differentiate between two different ways of scaling a system.
Vertical scalability involves adding more resources to scale the system on a single computer. This might, for example, be achieved by buying a computer with more processors or RAM (something that is pretty easy – if expensive " on cloud infrastructure like AWS or Google Cloud). Vertical scalability is nice because it does not require us to change our application implementation too much: simply adding more threads to a program is sufficient to handle more requests (assuming the computation was parallelizable to begin with).
But the downside of vertical scalability is that it faces fundamental limits: due to the physics of energy conservation, heat dissipation, and the speed of light, a computer cannot have more than a certain number of processors (in the hundreds with current technology) before it runs too hot or would have to slow down processor speed significantly. This puts a practical limit to how far we can scale a system vertically. Another limit (but sometimes also a benefit) is that a vertically-scaled system is a single fault domain: if it loses power, the entire system turns off. This can be a problem (a website run from this computer no longer works), but – as we will see when we discuss the alternatives – also avoids a lot of complexity associated with more resilient distributed systems.
The alternative is horizontal scaling, which works by adding more computers to the system (i.e., making the server itself a distributed system). This is easy to do in principle: public cloud platforms allow anyone with a credit card to rent hundreds of virtual machines (or more) with a few clicks. This provides practically unlimited scalability, as long as we can figure out a way to split our application in such a way that it can harness many computers. (It turns out that this split, and issues related to fault tolerance, really add a lot of complexity to the system, however.)
Sharding: splitting a service for scalability
To use multiple computers to scale a service, we need a way to split the service's work between many computers. Sharding is the term for such a split: think about throwing our system on the floor and seeing it break into many shards, which are independent but together make up the whole of the system.
To shard a system, we split its workload along some dimension. Possible dimensions include splitting by
client, or splitting by the resource that a client seeks to access (e.g., in an RPC). When you talk to a
website like google.com
, you and other users access the same domain name, but actually talk to
different computers, both depending on your geographic location and based on a random selection implemented
by the Domain Name System (DNS) server for google.com
. This "load balancing" is a
form of sharding by client: different front-end servers in Google's data centers receive network connection
requests from different clients, based on the IP address that google.com
resolved into for each
specific client.
But sharding by client requires that every server that a client might talk to be equally able to handle its requests. This is pretty difficult to ensure in a practical distributed system. Consider WeensyDB: if we were to shard it by client, every server would need to be able to serve requests for every single key stored in the WeensyDB. A better alternative for this kind of stateful service is to shard by resource (i.e., by key in the case of WeensyDB).
In practice, this sharding might be realized by splitting the key space of WeensyDB's keys (which are strings) into different regions ("shards") assigned to different servers. In the picture above, server S0 handle keys starting with letters A-H, while S1 handles those starting with I-S, and S2 handles T-Z.
To make this sharding work, the client must know which server is responsible for which range of keys. This assignment (the "sharding function") is either hard-coded into the client, or part of a configuration it dynamically obtains from a coordinator system (in the Distributed Store project, this coordinator is called "shardmaster").
A properly sharded service scales very well, since we can simply add more servers and split the key ranges assigned to a shard in order to add more capacity to the system. But there are some edge cases: for example, many social media services have highly skewed key popularities, leading to a few popular keys (e.g., the timeline for a celebrity user) to receive a disproportionaly larger number of requests than others. This means that keys and the load they induce are no longer equal, and the sharding must take this into account.
Networking
Networking is a technique for computers to communicate with one another, and to make it work, we rely on a set of OS abstractions, as well as plenty of kernel code that interacts with the actual hardware that sends and receives data across wires and WiFi.
Network Infrastructure
It's difficult to think about computer networks without thinking about the underlying infrastructure powering them. In the old days of telephony networks, engineers and telephone operators relied on circuit switching to manage connections for telephone calls, meaning that each telephone connection occupied a physical, dedicated phone line. Circuit switching was widely used over a long period of time, even during early days of modern digital computing. Circuit switching significantly underutilized resources in that an idle connection (e.g. periods in a phone conversation when neither party was actively saying anything) must also keep a phone line occupied. Extremely complex circuit switching systems were built as telephone systems expanded, but circuit switching itself is inherently not scalable, because it requires dedicated lines (the "circuits") between endpoints.
Modern computer networks use packet switching, which allows sharing wires and infrastructure between many communicating parties. This means that computers do not need to rely on dedicated direct connections to communicate. The physical connections between computers are instead shared, and the network carries individual packets (small, fixed-size units of data), instead of full connections.
The concept of a connection now becomes an abstraction, implemented by layers of software protocols responsible for transmitting and processing packets, and presented to the application software as a stream connection by the operating system.
Thanks to packet switching and the extensive sharing of the physical infrastructure it enables, the internet has become cheap and stable.
Packets
A packet is a unit of data sent or received over the network. Computers communicate to one another over the network by sending and receiving packets. Packets have a maximum size, so if a computer wants to send data that does not fit in a single packet, it will have to split the data to multiple packets and send them separately. Each packet contains:
- Addresses (source and destination)
- Checksum
- to detect data corruption during transmission
- Ports (source and destination)
- to distinguish logical connections to the same machines
- Actual payload data
Ports are numbers in the range 1-65,535 that help the OS tell apart different connections, even if they are with the same remote computer. The tuple of (source address, source port, destination address, destination port) is guaranteed to be unique on both ends for any given connection.
Summary
We then talked about how to scale a system using using a key technique, sharding. A sharded system partiions its state along some dimension, allowing different servers to maintain independent slices ("shards") of state.
Finally, we started talking about how computer communicate over networks like the internet. We saw that packets are an essential abstraction that allows computers to use the shared resource of the network, even if they aren't cooperative. Next time, we'll discuss some higher-level ways of interacting with the network, talk about what happens to the distributed system when computers fail, and wrap up the course.