CSCI 0300/1310: Fundamentals of Computer Systems

⚠️ This is not the current iteration of the course! Head here for the current offering.

Lecture 23: Condition Variable Details, Networking

» Lecture video (Brown ID required)
» Lecture code
» Post-Lecture Quiz (due 11:59pm Sunday, May 1)

Condition Variable Details

Condition Variable recap

See the notes from Lecture 22 for the example of condition variables in the bounded buffer program that we discussed at the start of the lecture.

Why the `while` loop around `cv.wait()`?

Why is it necessary to have wait() in a while loop?

wait() is almost always used in a loop because of what we call spurious wakeups. Since notify_all() wakes up all threads blocking on a certain wait() call, by the time when a particular blocking thread locks the mutex and gets to run, it's possible that some other blocking thread has already unblocked, made some progress, and changed the unblocking condition back to false. For this reason, a "woken-up" must revalidate the unblocking condition before proceeding further, and if the unblocking condition is not met it must go back to blocking. The while loop achieves exactly this.

Networking

We are now moving on to the last block of the course, which covers distributed systems. The largest distributed system in existence is one we all use every day: the internet, a global networking between many computer systems.

Networking is a technique for computers to communicate with one another, and to make it work, we rely on a set of OS abstractions, as well as plenty of kernel code that interacts with the actual hardware that sends and receives data across wires and WiFi.

Network Infrastructure

It's difficult to think about computer networks without thinking about the underlying infrastructure powering them. In the old days of telephony networks, engineers and telephone operators relied on circuit switching to manage connections for telephone calls, meaning that each telephone connection occupied a physical, dedicated phone line. Circuit switching was widely used over a long period of time, even during early days of modern digital computing. Circuit switching significantly underutilized resources in that an idle connection (e.g. periods in a phone conversation when neither party was actively saying anything) must also keep a phone line occupied. Extremely complex circuit switching systems were built as telephone systems expanded, but circuit switching itself is inherently not scalable, because it requires dedicated lines (the "circuits") between endpoints.

Modern computer networks use packet switching, which allows sharing wires and infrastructure between many communicating parties. This means that computers do not need to rely on dedicated direct connections to communicate. The physical connections between computers are instead shared, and the network carries individual packets (small, fixed-size units of data), instead of full connections.

The concept of a connection now becomes an abstraction, implemented by layers of software protocols responsible for transmitting and processing packets, and presented to the application software as a stream connection by the operating system.

Thanks to packet switching and the extensive sharing of the physical infrastructure it enables, the internet has become cheap and stable.

Packets

A packet is a unit of data sent or received over the network. Computers communicate to one another over the network by sending and receiving packets. Packets have a maximum size, so if a computer wants to send data that does not fit in a single packet, it will have to split the data to multiple packets and send them separately. Each packet contains:

Addresses (source and destination)
Checksum
- to detect data corruption during transmission
Ports (source and destination)
- to distinguish logical connections to the same machines
Actual payload data

Ports are numbers in the range 1-65,535 that help the OS tell apart different connections, even if they are with the same remote computer. The tuple of (source address, source port, destination address, destination port) is guaranteed to be unique on both ends for any given connection.

Summary

Today, we finished off our discussion of thread synchronization by considering two details of the conditional variable API: why wait() needs to atomically release the lock and block the calling thread, and why we need to wrap calls to wait() in a while loop. For both of these choices, it turns out that a condition variable without them allows for subtly incorrect executions when multiple threads interleave in a pessimal way, and we saw examples of this.

We then finished off our discussion of deadlock by devising strategies for avoiding a deadlock: we can either try taking the second (or subsequent lock) with a non-blocking try_lock call and give up all prior locks if locking fails, allowing other threads a chance to acquire the locks; or we can impose a fixed locking order that all threads abide by, which avoids deadlock by construction.

Finally, we started talking about how computer communicate over networks like the internet. We saw that packets are an essential abstraction that allows computers to use the shared resource of the network, even if they aren't cooperative. Next time, we'll look in more detail at the system calls involved in setting up network connections and the mechanisms for applications to communicate with remote computers.