Lecture 12: Stack, Buffer Overflow, and Caching
Base Pointers and Buffer Overflow
Base Pointers and the %rbp
Keeping track of the entry %rsp
can be tricky with more complex functions that allocate lots of
local variables and modify the stack in complex ways. For these cases, the x86-64 Linux calling convention allows
for the use of another register, %rbp
as a special-purpose register.
holds the address of the base of the current stack frame: that is, the address of
the rightmost (highest) address that points to a value still part of the current stack frame. This corresponds the
rightmost address of an object in the callee's stack, and to the first address that isn't part of an argument to
the callee or one of its local variables. It is called the base pointer, since the address
points at the "base" of the callee's stack frame (if %rsp
points to the "top",
points to the "base" (= bottom). The %rbp
register maintains this value for
the whole execution of the function (i.e., the function may not overwrite the value in that register), even as
This scheme has the advantage that when the function exits, it can restore its original entry %rsp
by loading it from %rbp
. In addition, it also facilitates debugging because each function stores the
old value of %rbp
to the stack at its point of entry. The 8 bytes holding the caller's
are the very first thing stored inside the callee's stack frame, and they are right below the
return address, which is in the caller's stack frame, while the saved %rbp
is in the callee stack
frame. This mean that the saved %rbp
s form a chain that allows
each function to locate the base of its caller's stack frame, where it will find the %rbp
of the
"grand-caller's" stack frame, etc. The backtraces you see in GDB and in Address Sanitizer error messages
are generated precisely using this chain!
Therefore, with a base pointer, the function entry sequence becomes:
The first instruction executed by the callee on function entry is
pushq %rbp
. This saves the caller's value for%rbp
into the callee's stack. (Since%rbp
is callee-saved, the callee is responsible for saving it.)The second instruction is
movq %rsp, %rbp
. This saves the current stack pointer in%rbp
= entry%rsp
- 8).This adjusted value of
is the callee's "frame pointer" or base pointer. The callee will not change this value until it returns. The frame pointer provides a stable reference point for local variables and caller arguments. (Complex functions may need a stable reference point because they reserve varying amounts of space.)Note, also, that the value stored at
is the caller's%rbp
, and the value stored at8(%rbp)
is the return address. This information can be used to trace backwards by debuggers (a process called "stack unwinding").The function ends with
movq %rbp, %rsp; popq %rbp; retq
, or, equivalently,leave; retq
. This sequence is the last thing the callee does, and it restores the caller's%rbp
and entry%rsp
before returning.
You can find an example of this in call07.s
. Lab 3 also uses the %rbp
-based calling
convention, so make sure you keep the extra 8 bytes for storing the caller's %rbp
on the stack in mind!
Buffer overflow attacks
Now that we understand the calling convention and the stack, let's take a step back and think of some of the consequences of this well-defined memory layout. While a callee is not supposed to access its caller's stack frame (unless it's explicitly passed a pointer to an object within it), there is no principled mechanism in the x86-64 architecture that prevents such access.
In particular, if you can guess the address of a variable on the stack (either a local within the current function or a local/argument in a caller of the current function), your program can just write data to that address and overwrite whatever is there.
This can happen accidentally (due to bugs), but it becomes a much bigger problem if done deliberately by malicious actors: a user might provide input that causes a program to overwrite important data on the stack. This kind of attack is called a buffer overflow attack.
Consider the code in checksummer.cc
. This program computes checksums of strings provided to it as command
line arguments. You don't need to understand in deep detail what it does, but observe that the checksum()
function uses a 100-byte stack-allocated buffer (as part of the buf
union) to hold the input string, which
it copies into that buffer.
A sane execution of checksummer
might look like this:
$ ./checksummer
hey yo CS300
<stdin>: checksum 00796568
But what if the user provides an input string longer than 399 characters (remember that we also need the zero terminator
in the buffer)? The function just keeps writing, and it will write over whatever is adjacent to buf
on the
From our prior pictures, we know that buf
will be in checksum
's stack frame, below the
entry %rsp
. Moreover, directly above the entry %rsp
is the return address! In this
case, that is an address in main()
. So, if checksum
writes beyond the end of buf
will overwrite the return address on the stack; if it keeps going further, it will overwrite data in main
stack frame.
Why is overwriting the return address dangerous? It means that a clever attacker can direct the program to execute
any function within the program. In the case of checksummer.cc
, note the exec_shell()
which runs a string as a shell command. This has a lot of nefarious potential – what if we could cause that
function to execute with a user-provided string? We could print a lot of sad face emojis to the shell, or, more
dangerously, run a command like rm -rf /
, which deletes all data on the user's computer!
If we run ./checksummer.unsafe
(a variant of checksummer
with safety features added by mondern
compilers to combat these attacks disabled), it behaves as normal with sane strings:
$ ./checksummer.unsafe
hey yo CS300
<stdin>: checksum 00796568
But if we pass a very long string with more than 400 characters, things get a bit more unusual:
$ ./checksummer.unsafe < austen.txt
Segmentation fault (core dumped)
The crash happens because the return address for checksum()
was overwritten by garbage from our string,
which isn't a valid address. But what if we figure out a valid address and put it in exactly the right
place in our string?
This is what the input in attack.bytes
does. Specifically, using GDB, I figured out that the address of
in my compiled version of the code is 0x401156 (an address in the code/text segment of the
executable). attack.bytes
contains a carefully crafted "payload" that puts the value 0x400870
into the right bytes on the stack. The attack payload is 424 characters long because we need 400 characters to overrun
, 8 bytes for the base pointer, 4 bytes for the malicious return address, and 12 bytes of extra payload
because stack frames on x86-64 Linux are aligned to 16-byte boundaries.
Executing this attack works as follows:
$ ./checksummer.unsafe < attack.bytes
The < attack.bytes
syntax simple pastes the contents of the attack.bytes
file into the
input to the program.
Caching and the Storage Hierarchy
We are now switching gears to talk about one of the most important performance-improving concepts in computer systems. This concept is the idea of cache memory.
Why are we covering this?
Caching is an immensely important concept to optimize performance of a computer system. As a software engineer in industry, or as a researcher, you will probably find yourself in countless situations where "add a cache" is the answer to a performance problem. Understanding the idea behind caches, as well as when a cache works well, is important to being able to build high-performance applications.
We will look at specific examples of caches, but a generic definition is the following: a cache is a small amount of fast storage used to speed up access to larger, slower storage.
One reasonable question is what we actually mean by "fast storage" and "slow storage", and why need both. Couldn't we just put all of the data on our computer into fast storage?
To answer this question, it helps to look at what different kinds of storage cost and how this cost has changed over time.
The Storage Hierarchy
When we learn about computer science concepts, we often talk about "cost": the time cost and space cost of algorithms, memory efficiency, and storage space. These costs fundamentally shape the kinds of solutions we build. But financial costs also shape the systems we build, and the costs of the storage technologies we rely on have changed dramatically, as have their capacities and speeds.
The table below gives the price per megabyte of different storage technology, in price per megabyte (2010 dollars), up to 2019. (Note that flash/SSD storage did not exist until the early 2000s, when the technology became available.)
Year | Memory (DRAM) | Flash/SSD | Hard disk |
~1955 | $411,000,000 | $9,200 | |
1970 | $734,000.00 | $260.00 | |
1990 | $148.20 | $5.45 | |
2003 | $0.09 | $0.305 | $0.00132 |
2010 | $0.019 | $0.00244 | $0.000073 |
2021 | $0.003 | $0.00008 | $0.0000194 |
(Prices due to John C. McCallum, and inflation data from here. $1.00 in 1955 had "the same purchasing power" as $9.62 in 2019 dollars.)
Computer technology is amazing – not just for what it can do, but also for just how tremendously its cost has dropped over the course of just a few decades. The space required to store a modern smartphone photo (3 MB) on a harddisk would have costs tens of thousands of dollars in the 1950s, but now costs a fraction of a cent.
But one fundamental truth has remained the case across all these numbers: primary memory (DRAM) has always been substantially more expensive than long-term disk storage. This becomes even more evident if we normalize all numbers in the table to the cost of 1 MB of harddisk space in 2019, as the second table below does.
Year | Memory (DRAM) | Flash/SSD | Hard disk |
~1955 | 219,800,000,000,000 | 333,155,000 | |
1970 | 39,250,000,000 | 13,900,000 | |
1990 | 7,925,000 | 291,000 | |
2003 | 4,800 | 16,300 | 70 |
2010 | 1,000 | 130 | 3.9 |
2021 | 155 | 4.12 | 1 |
As a consequence of this price differential, computers have always had more persistent disk space than primary memory. Harddisks and flash/SSD storage are persistent (i.e., they survive power failiures), while DRAM memory is volatile (i.e., its contents are lost when the computer loses power), but harddisk and flash/SSD are also much slower to access than memory.
In particular, when thinking about storage performance, we care about the latency to access data in storage. The latency denotes the time it takes until data retrieved is available if read, or until it is on the storage medium if written. A longer latency is worse, and a smaller latency better, as a smaller latency means that the computer can complete operations sooner.
Another important storage performance metric is throughput (or "bandwidth"), which is the number of operations completed per time unit. Throughput is often, though not always, the inverse of latency. An ideal storage medium would habe low latency and high throughput, as it takes very little time to complete a request, and many units of data can be transferred per second.
In reality, though, latency generally grows, and throughput drops, as storage media are further and further away from the processor. This is partly due to the storage technologies employed (some, like spinning harddisks, are cheap to manufacture, but slow), and partly due to the inevitable physics of sending information across longer and longer wires.
The table below shows the typical capacity, latency, and throughput achievable with the different storage technologies available in our computers.
Storage type | Capacity | Latency | Throughput (random access) | Throughput (sequential) |
Registers | ~30 (100s of bytes) | 0.5 ns | 16 GB/sec (2x109 accesses/sec) | |
DRAM (main memory) | 8 GB | 60 ns | 100 GB/sec | |
SSD (stable storage) | 512 GB | 60 ยตs | 550 MB/sec | |
Hard disk | 2–5 TB | 4–13 ms | 1 MB/sec | 200 MB/sec |
This notion of larger, cheaper, and slower storage further away from the processor, and smaller, faster, and more expensive storage closer to it is referred to as the storage hierarchy, as it's possibly to neatly rank storage according to these criteria. The storage hierarchy is often depicted as a pyramid, where wider (and lower) entries correspond to larger and slower forms of storage.
Today, we learned about base pointers and saw an example of a buffer overflow. We also reviewed the layout of the stack.
We then talked about the storage hierarchy with smaller, but faster, storage at the top, and slower, but larger storage at the bottom. Caches are a way of making the bottom layers appear faster than they actually are!