CSCI 0300/1310: Fundamentals of Computer Systems

⚠️ This is not the current iteration of the course! Head here for the current offering.

Lecture 19: Synchronization (III), Networking

» Lecture code
» Post-Lecture Quiz (due 11:59pm Monday, April 5)

S1: Condition Variable FAQ, wait atomicity

Coming soon!

S2: Condition Variables, Deadlock

Why the `while` loop around `cv.wait()`?

Why is it necessary to have wait() in a while loop?

wait() is almost always used in a loop because of what we call spurious wakeups. Since notify_all() wakes up all threads blocking on a certain wait() call, by the time when a particular blocking thread locks the mutex and gets to run, it's possible that some other blocking thread has already unblocked, made some progress, and changed the unblocking condition back to false. For this reason, a "woken-up" must revalidate the unblocking condition before proceeding further, and if the unblocking condition is not met it must go back to blocking. The while loop achieves exactly this.

S3: Deadlock, continued

We saw last time how programs that lock multiple resources can encounter a situation called "deadlock" where multiple threads each hold some locks, but no thread holds all the locks required to make process.

Deadlock is an insidious problem, and the only way to avoid deadlocks in your programs is to have a strict locking order that the entire program abides by.

In the ballgame, we need to apply this logic to avoid the deadlock. The problematic situation occurs when a player has already locked one mutex (in this case, their own ball state mutex) and then seeks to lock another mutex (the destination player's ball state mutex).

Unfortunately for us, it is not trivial to use a lock ordering where we always lock the lower-numbered player's state first, since each player needs to lock their own ball state to check if it holds a ball before even considering the destination player's state.

There are several ways we could solve this problem:

read state, unlock, get lower numbered lock, get other lock, recheck ball state;
use try_lock() on the mutex and give up lock on own ball state if failed; or
always lock destination player's mutex first, even if not holding the ball.

In passtheball-fixed.cc, we use the second approach (try_lock()). While this is the smallest change from the prior code, it does result in inefficiency because we sometimes lock a mutex (using an expensive atomic instruction) only to then fail to lock the other mutex and give up and try again. But observe that the third approach also unnecessarily takes a lock in the (arguably more common) case that the player doesn't actually have a ball! The first approach has no inefficiency, but is more difficult to implement.

S3: Networking

We are now moving on to the last block of the course, which covers distributed systems. The largest distributed system in existence is one we all use every day: the internet, a global networking between many computer systems.

Networking is a technique for computers to communicate with one another, and to make it work, we rely on a set of OS abstractions, as well as plenty of kernel code that interacts with the actual hardware that sends and receives data across wires and WiFi.

Network Infrastructure

It's difficult to think about computer networks without thinking about the underlying infrastructure powering them. In the old days of telephony networks, engineers and telephone operators relied on circuit switching to manage connections for telephone calls, meaning that each telephone connection occupied a physical, dedicated phone line. Circuit switching was widely used over a long period of time, even during early days of modern digital computing. Circuit switching significantly underutilized resources in that an idle connection (e.g. periods in a phone conversation when neither party was actively saying anything) must also keep a phone line occupied. Extremely complex circuit switching systems were built as telephone systems expanded, but circuit switching itself is inherently not scalable, because it requires dedicated lines (the "circuits") between endpoints.

Modern computer networks use packet switching, which allows sharing wires and infrastructure between many communicating parties. This means that computers do not need to rely on dedicated direct connections to communicate. The physical connections between computers are instead shared, and the network carries individual packets (small, fixed-size units of data), instead of full connections.

The concept of a connection now becomes an abstraction, implemented by layers of software protocols responsible for transmitting and processing packets, and presented to the application software as a stream connection by the operating system.

Thanks to packet switching and the extensive sharing of the physical infrastructure it enables, the internet has become cheap and stable.

Packets

A packet is a unit of data sent or received over the network. Computers communicate to one another over the network by sending and receiving packets. Packets have a maximum size, so if a computer wants to send data that does not fit in a single packet, it will have to split the data to multiple packets and send them separately. Each packet contains:

Addresses (source and destination)
Checksum
- to detect data corruption during transmission
Ports (source and destination)
- to distinguish logical connections to the same machines
Actual payload data

Ports are numbers in the range 1-65,535 that help the OS tell apart different connections, even if they are with the same remote computer. The tuple of (source address, source port, destination address, destination port) is guaranteed to be unique on both ends for any given connection.

Summary

Today, we finished off our discussion of thread synchronization by considering two details of the conditional variable API: why wait() needs to atomically release the lock and block the calling thread, and why we need to wrap calls to wait() in a while loop. For both of these choices, it turns out that a condition variable without them allows for subtly incorrect executions when multiple threads interleave in a pessimal way, and we saw examples of this.

We then finished off our discussion of deadlock by devising strategies for avoiding a deadlock: we can either try taking the second (or subsequent lock) with a non-blocking try_lock call and give up all prior locks if locking fails, allowing other threads a chance to acquire the locks; or we can impose a fixed locking order that all threads abide by, which avoids deadlock by construction.

Finally, we started talking about how computer communicate over networks like the internet. We saw that packets are an essential abstraction that allows computers to use the shared resource of the network, even if they aren't cooperative. Next time, we'll look in more detail at the system calls involved in setting up network connections and the mechanisms for applications to communicate with remote computers.