Lecture 14: Introduction to Operating Systems

🎥 Lecture video (no audio from 55:00) (Brown ID required)
🎥 2023 Lecture video starting at 55:00 (Brown ID required)
💻 Lecture code
❓ Post-Lecture Quiz (due 11:59pm, Monday, March 18).

Operating Systems

Why are we covering this?

Operating systems are an important part of our computing landscape, and how they work impacts how other computer systems – including the applications you write, and the distributed systems that technology companies run – work. In this unit, we will try to understand the fundamental abstractions behind today's operating systems, and conclude from them how you can write efficient and safe software. We won't go into the deepest possible detail on exactly how an operating system implements these abstractions – if you're curious, consider taking CSCI 1670!

We are now entering the second block of the course. So far, we talked about our computers as if they only run a single program: we treated memory as if there is only one instance of each segment (e.g., stack, heap, etc.), and we treated the processor as though it only ever runs machine code instructions from one program. This was indeed true of early computers, but today's computers evidently run several programs – even (seemingly) at the same time!

In this block of the course, we will understand the concepts that make this safe sharing of a computer's hardware possible. Many of these concepts are implemented inside a special software program that every computer runs: the operating system.

Examples of operating systems in wide use today include Microsoft's Windows, Apple's iOS and Mac OS X, and the free, open-source operating systems Linux and BSD (of which several variants exist).

Why do we even need operating systems? For an analogy, consider why society has police and the courts: not every citizen always abides by the rules of society (i.e., the law), but in order to make life safe for others, we have people and structures that enforce the rules. The same applies with computer programs: not all computer programs are always inclined to play by the rules. Some programs are outright malicious (e.g., viruses or ransomware), others can become co-opted by malicious hackers (e.g., due to bugs such as unchecked buffer overflows), and others again are benign but resource-hungry and can crowd out other programs that run on the same computer.

Here is an example of a program that, depending on your viewpoint, could be seen as benign or as actively malicious (attack.cc):

int main() {
    while (true) {
    }
}
This program runs an infinite loop! When when disassemble its machine code, we find the following instructions in main():
00000000000005fa <main>:
 5fa:	55                   	push   %rbp
 5fb:	48 89 e5             	mov    %rsp,%rbp
 5fe:	eb fe                	jmp    5fe 
The first two instructions run only once, but the final jmp jumps to byte offset (address) 5fe within the program, which turns out to be ... that same instruction!

Based on our understanding of how processors work, as presented in the course so far, running this program should be potential fatal to our computer: the processor only does what it's told, so it will infinitely keep executing the jmp instruction at 5fe. The processor gets stuck in an infinite loop of an instruction jumping back to itself, and no other program ever gets to use the processor. (If your computer has multiple processors, it could still make progress even when such an infinite loop is running, but an attacker might just run multiple infinite loop programs until all processors are busy.)

The role of the operating system is to make sure that all programs play by the rules, and also to make it easier for programs to use shared resources such as the computer's hardware.

Kernel and Processes

An operating system (OS) in practice consists of many components (including a whole bunch fo preinstalled applications, desktop backgrounds, aesthetic and GUI elements), but for the purpose of this course, we care particularly about the most privileged core of the OS, the kernel.

The kernel is the operating system software that runs with full machine privilege, meaning full privilege over all resources on the computer. The kernel is all-powerful: it can access anything and do anything it wants without having to pass any checks on its actions.

Unprivileged processes (also called "user-level processes", where "user-level" is the opposite of "kernel"), by contrast, are software that runs without elevated machine privilege. A process is a program in execution. But processes can have bugs: they may access invalid memory, divide by zero, start running infinite loops, or run haywire in other ways (maliciously or accidentally). The kernel should prevent mistakes of an individual process from bringing down the whole system.

What's the difference between a "program" and a "process"?

A process is a program in active execution (e.g., a running web browser showing the course website), while the notion of program refers to the concept of code to achieve a specific purpose (e.g., a web browser, which generically serves the purpose of displaying webpages). In this course, we use "program" to mean the "dead" notion of compiled machine code on storage, while we use "process" to refer to an active, executing program managed by the OS.

In modern operating systems, much kernel code aims to provide protection for processes from other processes: protection ensures that no process can violate the operating system's sharing policies.

A kernel has three general goals:

  1. Ensure robustness and performance by isolating programs from each other.
  2. Share the computer's hardware resources fairly.
  3. Provide safe and convenient access to hardware resources by inventing abstractions for those resources.
In lectures, we will see several examples of resources that the OS needs to protect. In your lab and project for this unit of the course, you will add memory protection to a small, yet fully functions, OS called WeensyOS.

DemoOS and WeensyOS

WeensyOS is our teaching operating system. It includes a kernel and a userspace portion, and could, in principle, run on my or your computer, provided they use the x86-64 instruction set architecture. In lectures and assignments, we will instead run WeensyOS on an emulated computer, using a software called QEMU. QEMU is a program that faithfully fakes out a x86-64 processor, memory, and hardware devices (such as a screen, keyboard, etc.). If you run it inside your course VM, you effectively have an emulated computer (QEMU's) running on another emulated computer (your course VM), which in turn runs on a real computer. You can think of this as some kind of Matryoshka doll of fake computers...

Why do you say "in principle"? Can I actually run WeensyOS on my computer?

The qualification in that sentence is there because WeensyOS would need to be extended with a fair amount of device driver code to run on a physical computer. Real computers use a huge variety of different hardware chips, and these chips need specific "driver" software for the OS to be able to use them. Commercial operating systems like Windows, Mac OS X, or even open-source Linux come with thousands of drivers for all sorts of weird chips and devices, but WeensyOS only has drivers for the devices QEMU emulates. Hence, it wouldn't actually manage to start up and show information on your computer's screen!

Another reason why you wouldn't want to run WeensyOS on your computer is that it would be rather painful to debug if you had made a mistake: you would need to restart your computer every time you hit a bug in the kernel. With QEMU, you can simply restart the emulator.

For the purpose of the course, we will run QEMU with a single emulated processor, and WeensyOS will use up to 2 MB of RAM. That is not a lot, but restricting the operating system to a small amount of memory makes it possible for you to keep track of exactly what's happening.

Note that operating system kernels are just programs, and writing a kernel is broadly similar to normal programming. For example, WeensyOS is written in C++, and much of the kernel code will read just like a normal C++ program. There are some differences, however:

This makes kernel programming more difficult, but also immensely fun: once it works, you'll get the cool feeling of having built part of your own operating system!

Keeping Bad Programs In Check

Let's consider a situation where two programs are running on a computer. We'll call the programs "Alice" and "Eve", and they're implemented as p-alice.cc and p-eve.cc in the lecture code.

When I run DemoOS (make run), Alice and Eve are alternately printing lines to the screen, saying "Hi, I'm Alice!" followed by a number, and vice versa for Eve. A nice example of successful sharing of a computer!

Let's look at what p-alice.cc actually does. This is a user-space program, and its code is below:

// p-alice.cc
#include "u-lib.hh"

void process_main() {
    char buf[128];
    sys_getsysname(buf);

    console_printf(0x1D00, "This is %s.\n", buf);

    char msg[15];
    snprintf(msg, 15, "Hi, I'm Alice!");

    unsigned i = 0;
    while (true) {
        ++i;
        if (i % 1024 == 0) {
            console_printf(0x1D00, "%s #%d\n", msg, i / 512);
        }
        sys_yield();
    }
}
On WeensyOS, the a userspace process starts execution in process_main(). p-alice.cc's implementation contains four lines (bolded above) that interact with the kernel and/or hardware: Overall, what this program does is the following:
  1. It obtains the OS identifier ("DemoOS 1.31") via tha sys_getsysname system call and stores it in buf.
  2. It then fills another buffer (msg) with the message "Hi, I'm Alice!".
  3. It finally starts an infinite loop, printing the message on every 1,024th iteration. Each loop iteration also invokes the sys_yield() system call that tells the kernel to let another process run.

Now, let's look at p-eve.cc, which is Eve's program.

// p-eve.cc
#include "u-lib.hh"

void process_main() {
    unsigned i = 0;
    while (true) {
        ++i;
        if (i % 1024 == 0) {
            console_printf(0x0E00, "Hi, I'm Eve! #%d\n", i / 512);
        }
        sys_yield();
    }
}
Initially, Eve's process will run very similar code to Alice's.

But Eve is evil, and her goal is to take over the computer from Alice. How can she do this? One possible plan is to monopolize one of the shared hardware resources of the computer and to deny Alice access to it.

Protected Resource – Processor Time

What can Eve do to monopolize the computer? One simple thing she can do is to stop playing nice and call sys_yield() every time around the loop. If Eve does not give up the processor by calling sys_yield(), then Alice never gets to run! Indeed, Eve can just add an infinite loop to her program prior to the call to sys_yield() and rest assured, execution will never make it to sys_yield().

This phenomenon of depriving another process of access to a crucial resource is called starvation. The specific resource that Eve is attacking in this case is processor time: because our emulated computer in QEMU has only one processor, only one process can run on it at the same time. If that process never gives up the processor, it just keeps running.

The underlying reason for why this attack succeeds is because Eve simply never returned control to the OS kernel. What we need is a way for the OS kernel to wrest back control from a misbehaving process like Eve's.

Interrupts

The solution to our problem is to rely on a hardware mechanism to put the kernel back in control at regular intervals. This timer hardware is configured to generate an "alarm" at regular frequencies (e.g., every millisecond). But what happens when an alarm goes off? Who should control the policy? The kernel! This means when the alarm goes off, the kernel needs to take control. So once the alarm goes off, whatever program the processor was running before gets interrupted and the kernel gets to run. This kind of control transfer from a user process to the kernel is called an interrupt.

Currently, our DemoOS doesn't have timer interrupts configured. A small change to the kernel can help turn them on, however: in the kernel.cc file, we add a call to init_timer(1000) in the startup code to set up a timer that goes off every 1000 microseconds. But that alone isn't enough! The kernel doesn't know how to handle a timer interrupt yet, so if we run with this change, the first time a timer interrupt occurs, the kernel crashes with an error indicating an "Unhandled exception 32".

To handle the timer interrupt, we need to add a case to the code in the exception() function in kernel.cc. The code there handles different interrupts via a switch statement, and if we add a case for interrupt 32, we can handle it correctly. What does a correct handler do? We would like to give another process (specifically, Alice's process) a chance to run. The code needed for this purpose is this:

switch (regs->reg_intno) {
    // [...]

    case 32:
        lapicstate::get().ack(); // re-enable timer interrupt
        schedule(); // let something else run

    // [...]
}
The first line re-enables the timer interrupt, setting the next alarm. (We don't expect you to understand that line in detail.) The second line invokes the kernel process scheduler, which is a piece of code that decides what process should next get to run and eventually configures the processor to continue running that process.

With these changes in place, Eve's attack no longer succeeds: even though Eve's process is spinning in an infinite loop, it regularly gets interrupted by the timer interrupt, which causes the kernel to run and give Alice's process a chance to run; after Alice's process yields, Eve's process runs again.

We've solved the starvation problem by letting the privileged kernel take over control of the computer at regular intervals!

However, there is a snag: our solution relies on interrupts happening! In other words, the hardware must generate periodic timer interrupts. Since software sets up the timer interrupt, software can also disable it again. Specifically, the x86-64 instruction cli ("clear interrupts") has the effect of turning off all interrupt handling in the processor. Consequently, if a process is able to run this instruction, it can ignore timer interrupts. Eve might use code like asm volatile ("cli"); in her process to try to achieve this.

Summary

Today, we embarked on a new topic: Operating Systems. In particular, we learned that there is a privileged program on the computer – the kernel – whose job it is to ensure that all userspace processes play by the rules. We want processes to fairly share the computer's hardware resources, and we'd like them to be isolated from each other. Next time, we'll look more at how we can further isolate processes from each other.