CSCI 0300/1310: Fundamentals of Computer Systems

Lecture 11: Stack, Buffer Overflow

🎥 Lecture video (Brown ID required)
💻 Lecture code
❓ Post-Lecture Quiz (due 11:59pm, Wednesday, March 6).

Disk I/O

Input and output (I/O) on a computer must generally happen through the operating system, so that it can mediate and ensure that only one process at a time uses the physical resources affected by the I/O (e.g., a harddisk, or your WiFi). This avoids chaos and helps with fair sharing of the computer's hardware. (There are some exceptions to this rule, notably memory-mapped I/O and recent fast datacenter networking, but most classic I/O goes through the operating system.)

I/O System Calls

When programs want the OS to do I/O on their behalf, their mechanism of choice is a system call. System calls are like function calls, but they invoke OS functionality (which we'll discuss in more detail shortly). read() and write() are examples of system calls.

System calls are not cheap. They require the processor to do significant extra work compared to normal function calls. A system call also means that the program probably loses some locality of reference, and thus may have more processor cache misses after the system call returns. In practice, a system call takes 1-2µs to handle. This may seem small, but compared to a DRAM access (60ns), it's quite expensive – more than 20x the cost of a memory access. Frequent system calls are therefore one major source of poor performance in programs. In Project 3, you implement a set of tricks to avoid having to make frequent system calls!

File Descriptors

When a user-space process makes I/O system calls like read() or write(), it needs to tell the kernel what file it wants to do I/O on. This requires the kernel and the user-space process to have a shared way of referring to a file. On UNIX-like operating systems (such as macOS and Linux), this is done using file descriptors.

File descriptors are identifiers that the kernel uses to keep track of open resources (such as files) used by user-space processes. User-space processes refer to these resources using integer file descriptor (FD) numbers; in the kernel, the FD numbers index into a FD table maintained for each process, which may contain extra information like the filename, the offset into the file for the next I/O operation, or the amount of data read/written. For example, a user-space process may use the number 3 to refer to a file descriptor that the kernel knows corresponds to /home/malte/cats.txt.

To get a file descriptor number allocated, a process calls the open() syscall. open() causes the OS kernel to do permission checks, and if they pass, to allocate an FD number from the set of unused numbers for this process. The kernel sets up its metadata, and then returns the FD number to user-space. The FD number for the first file you open is usually 3, the next one 4, etc.

Why is the first file descriptor number usually 3?

On UNIX-like operating systems such as macOS and Linux, there are some standard file descriptor numbers. FD 0 normally refers to stdin (input from the terminal), 1 refers to stdout (output to the terminal), and 2 refers to stderr (output to the terminal, for errors). You can close these standard FDs; if you then open other files, they will reuse FD numbers 0 through 2, but your program will no longer be able to interact with the terminal.

Now that user-space has the FD number, it uses this number as a handle to pass into read() and write(). The full API for the read system call is: int read(int fd, void* buf, size_t count). The first argument indicates the FD to work with, the second is a pointer to the buffer (memory region) that the kernel is supposed to put the data read into, and the third is the number of bytes to read. read() returns the number of bytes actually read (or 0 if there are no more bytes in the file; or -1 if there was an error). write() has an analogous API, except the kernel reads from the buffer pointed to and copies the data out.

One important aspect that is not part of the API of read() or write() is the current I/O offset into the file (sometimes referred to as the "read-write head" in man pages). In other words, when a user-space process calls read(), it fetches data from whatever offset the kernel currently knows for this FD. If the offset is 24, and read() wants to read 10 bytes, the kernel copies bytes 24-33 into the user-space buffer provided as an argument to the system call, and then sets the kernel offset for the FD to 34.

A user-space process can influence the kernel's offset via the lseek() system call, but is generally expected to remember on its own where in the file the kernel is at. In Project 3, you'll have to maintain such metadata for your caching in user-space memory. In particular, when reading data into the cache or writing cached data into a file, you'll need to be mindful of the current offset that the I/O will happen at.

Calling Convention

We now return to our discussion of function calls in assembly and the layout of the stack segment of memory.

Some basic rules of the x86-64/Linux calling convention are:

The first six function arguments are passed in registers %rdi, %rsi, %rdx, %rcx, %r8, and %r9 (in this order; see the register list from last lecture).
The seventh and subsequent arguments are passed on the stack (see more below).
The return value is passed in register %rax.

There are actually several other rules, which govern things like how to pass data structures that are larger than a register (e.g., a struct), floating point numbers, etc. If you're interested, you can find all the details in the AMD64 ABI, section 3.2.3.

call04.s illustrates the rule about the first six arguments best: they are passed straight in registers. Other examples (e.g., call01 to call03) are compiled without optimizations and have somewhat more complex assembly code, which takes the values from registers, writes them onto the stack (more on that below), and then moves them into registers again.

The Stack

You will recall the stack segment of memory from earlier lectures: it is where all variables with automatic lifetime are stored. These include local variables declared inside functions, but importantly also function arguments.

Recall that in call01.s to call03.s contained a bunch of instructions referring to %rsp, such as this implementation of the function f() (from call01.s):

        movl    %edi, -4(%rsp)
        movl    -4(%rsp), %eax
        ret

The first movl stores the first argument (a 4-byte integer, passed in %edi) at an address four bytes below the address stored in register %rsp; the second movl instruction takes that value in memory and loads it into register %eax.

The %rsp register is called the stack pointer. It always points to the "top" of the stack, which is at the lowest (leftmost) address current used in the stack segment. At the start of the function, any memory to the left of where %rsp points is therefore unused; any memory to the right of where it points is used. This explains why the code stores the argument at addresss %rsp - 4: it's the first 4-byte slot available on the stack, to the left of the currently used memory.

In other words, the what happened with these instructions is that the blue parts of the picture below were added to the stack memory.

We can give names to the memory on the left and right of the address where %rsp points in the stack. The are called stack frames, where each stack frame corresponds to the data associated with one function call. The memory on the right of the address pointed to be %rsp at the point f() gets called is the stack frame of whatever function calls f(). This function is named the caller (the function that calls), while f() is the callee (the function being called).

The memory on the right of the %rsp address at the point of f() being called (we refer to this as "entry %rsp") is the caller's stack frame (red below), and the memory to its left is the callee's stack frame.

The arguments and local variables of f() live inside f()'s stack frame. Subsequent arguments (second, third, fourth, etc.) are stored at subsequently lower addresses below %rsp (see call02.s and call03.s for examples with more arguments), followed eventually by any local variables in the caller.

How does %rsp change?

The convention is that %rsp always points to the lowest (leftmost) stack address that is currently used. This means that when a function declares a new local variable, %rsp has to move down (left) and if a function returns, %rsp has to move up (right) and back to where it was when the function was originally called.

Moving %rsp happens in two ways: explicit modification via arithmetic instructions, and implicit modification as a side effect of special instructions. The former happens when the compiler knows exactly how many bytes a function requires %rsp to move by, and involves instructions like subq $0x10, %rsp, which moves the stack pointer down by 16 bytes. The latter, side-effect modification happens when instruction push and pop run. These instructions write the contents of a register onto the stack memory immediately to the left of the current %rsp and also modify %rsp to point to the beginning of this new data. For example, pushq %rax would write the 8 bytes from register %rax at address %rsp - 8 and set %rsp to that address; it is equivalent to movq %rax, -8(%rsp); subq $8, %rsp or subq $8, %rsp; movq %rax, (%rsp).

As an optimization, the compiler may choose to avoid writing arguments onto the stack. It does this for up to six arguments, which per calling convention are held in specific registers. call04.s shows this: the C code we compile it from (call04.c) is identical to the code in call03.c.

Functions with more than six arguments

There is a limited number of registers in the x86-64 architecture, and you can write functions in C that take any number of arguments! The calling convention says that the first six arguments max be passed in registers, but that the 7^th and above arguments are always passed in memory on the stack. Specifically, these arguments go into the caller's stack frame, so they are stored above the entry %rsp at the point where the function is called (see call05.{c,s} and call06.{c,s}).

Return Address

As a function executes, it eventually reaches a ret instruction in its assembly. The effect of ret is to return to the caller (a form a control flow, as the next instruction needs to change). But how does the processor know what instruction to execute next, and what to set %rip to?

It turns out that the stack plays a role here, too. In a nutshell, each function call stores the return address as the very first (i.e., rightmost) data in the callee's stack frame. (If the function called takes more than six arguments, the return address is to the left of the 7^th argument in the caller's stack frame.)

The stored return address makes it possible for each function to know exactly where to continue execution once it returns to its caller. (However, storing the return address on the stack also has some dangerous consequences, as we will see shortly.)

We can now define the full function entry and exit sequence. Both the caller and the callee have responsibilities in this sequence.

To prepare for a function call, the caller performs the following tasks:

The caller stores the first six arguments in the corresponding registers.
If the callee takes more than six arguments, or if some of its arguments are large, the caller must store the surplus arguments on its stack frame (in increasing order). The 7^th argument must be stored at (%rsp) (that is, the top of the stack) when the caller executes its callq instruction.
The caller saves any caller-saved registers (see last lecture's list). These are registers whose values the callee might overwrite, but which the caller needs to retain for later use.
The caller executes callq FUNCTION. This has an effect like pushq $NEXT_INSTRUCTION; jmp FUNCTION (or, equivalently, subq $8, %rsp; movq $NEXT_INSTRUCTION, (%rsp); jmp FUNCTION), where NEXT_INSTRUCTION is the address of the instruction immediately following callq.

To return from a function, the callee does the following:

The callee places its return value in %rax.
The callee restores the stack pointer to its value at entry ("entry %rsp"), if necessary.
The callee executes the retq instruction. This has an effect like popq %rip, which removes the return address from the stack and jumps to that address (because the instruction writes it into the special %rip register).
Finally, the caller then cleans up any space it prepared for arguments and restores caller-saved registers if necessary.

Summary

Today, we also understood in more detail how the stack segment of memory is structured and managed, and discussed how it grows and shrinks. We learned about how the compiler manages the stack pointer and how base pointers help it "unwind" the stack for debugging.

The very well-defined memory layout of the stack can become a danger if a program is compromised through a malicious input: by carefully crafting inputs that overwrite part of the stack memory via a buffer overflow, an attacker can change important data and cause a program to execute arbitrary code. We'll see more next time.

In Lab 3, you will craft and execute buffer overflow attacks on a program yourself!