Lecture 12: Stack, Buffer Overflow #
The Stack #
You will recall the stack segment of memory from earlier lectures: it is where all variables with automatic lifetime are stored. These include local variables declared inside functions, but importantly also function arguments.
Recall that in call01.s
to call03.s
contained a bunch of
instructions referring to %rsp
, such as this implementation of the
function f()
(from call01.s
):
movl %edi, -4(%rsp)
movl -4(%rsp), %eax
ret
The first movl
stores the first argument (a 4-byte integer, passed in
%edi
) at an address four bytes below the address stored in register
%rsp
; the second movl
instruction takes that value in memory and
loads it into register %eax
.
The %rsp
register is called the stack pointer. It always
points to the "top" of the stack, which is at the lowest (leftmost)
address current used in the stack segment. At the start of the function,
any memory to the left of where %rsp
points is therefore unused; any
memory to the right of where it points is used. This explains why the
code stores the argument at addresss %rsp - 4
: it's the first 4-byte
slot available on the stack, to the left of the currently used memory.
In other words, the what happened with these instructions is that the blue parts of the picture below were added to the stack memory.
We can give names to the memory on the left and right of the address
where %rsp
points in the stack. The are called stack frames, where
each stack frame corresponds to the data associated with one function
call. The memory on the right of the address pointed to be %rsp
at the
point f()
gets called is the stack frame of whatever function calls
f()
. This function is named the caller (the function that calls),
while f()
is the callee (the function being called).
The memory on the right of the %rsp
address at the point of f()
being called (we refer to this as "entry %rsp
") is the caller's stack
frame (red below), and the memory to its left is the callee's stack
frame.
The arguments and local variables of f()
live inside f()
's stack
frame. Subsequent arguments (second, third, fourth, etc.) are stored at
subsequently lower addresses below %rsp
(see call02.s
and call03.s
for examples with more arguments), followed eventually by any local
variables in the caller.
How does
%rsp
change?The convention is that
%rsp
always points to the lowest (leftmost) stack address that is currently used. This means that when a function declares a new local variable,%rsp
has to move down (left) and if a function returns,%rsp
has to move up (right) and back to where it was when the function was originally called.Moving
%rsp
happens in two ways: explicit modification via arithmetic instructions, and implicit modification as a side effect of special instructions. The former happens when the compiler knows exactly how many bytes a function requires%rsp
to move by, and involves instructions likesubq $0x10, %rsp
, which moves the stack pointer down by 16 bytes. The latter, side-effect modification happens when instructionpush
andpop
run. These instructions write the contents of a register onto the stack memory immediately to the left of the current%rsp
and also modify%rsp
to point to the beginning of this new data. For example,pushq %rax
would write the 8 bytes from register%rax
at address%rsp - 8
and set%rsp
to that address; it is equivalent tomovq %rax, -8(%rsp); subq $8, %rsp
orsubq $8, %rsp; movq %rax, (%rsp)
.
As an optimization, the compiler may choose to avoid writing arguments
onto the stack. It does this for up to six arguments, which per calling
convention are held in specific registers. call04.s
shows this: the C
code we compile it from (call04.c
) is identical to the code in
call03.c
.
Functions with more than six arguments #
There is a limited number of registers in the x86-64 architecture, and
you can write functions in C that take any number of arguments! The
calling convention says that the first six arguments max be passed in
registers, but that the 7th and above arguments are always
passed in memory on the stack. Specifically, these arguments go into the
caller's stack frame, so they are stored above the entry %rsp
at
the point where the function is called (see call05.{c,s}
and
call06.{c,s}
).
Return Address #
As a function executes, it eventually reaches a ret
instruction in its
assembly. The effect of ret
is to return to the caller (a form a
control flow, as the next instruction needs to change). But how does the
processor know what instruction to execute next, and what to set %rip
to?
It turns out that the stack plays a role here, too. In a nutshell, each function call stores the return address as the very first (i.e., rightmost) data in the callee's stack frame. (If the function called takes more than six arguments, the return address is to the left of the 7th argument in the caller's stack frame.)
The stored return address makes it possible for each function to know exactly where to continue execution once it returns to its caller. (However, storing the return address on the stack also has some dangerous consequences, as we will see shortly.)
We can now define the full function entry and exit sequence. Both the caller and the callee have responsibilities in this sequence.
To prepare for a function call, the caller performs the following tasks:
-
The caller stores the first six arguments in the corresponding registers.
-
If the callee takes more than six arguments, or if some of its arguments are large, the caller must store the surplus arguments on its stack frame (in increasing order). The 7th argument must be stored at
(%rsp)
(that is, the top of the stack) when the caller executes itscallq
instruction. -
The caller saves any caller-saved registers (see last lecture's list). These are registers whose values the callee might overwrite, but which the caller needs to retain for later use.
-
The caller executes
callq FUNCTION
. This has an effect likepushq $NEXT_INSTRUCTION; jmp FUNCTION
(or, equivalently,subq $8, %rsp; movq $NEXT_INSTRUCTION, (%rsp); jmp FUNCTION
), whereNEXT_INSTRUCTION
is the address of the instruction immediately followingcallq
.
To return from a function, the callee does the following:
-
The callee places its return value in
%rax
. -
The callee restores the stack pointer to its value at entry ("entry
%rsp
"), if necessary. -
The callee executes the
retq
instruction. This has an effect likepopq %rip
, which removes the return address from the stack and jumps to that address (because the instruction writes it into the special%rip
register). -
Finally, the caller then cleans up any space it prepared for arguments and restores caller-saved registers if necessary.
Base Pointers and Buffer Overflow #
Base Pointers and the %rbp
Register
#
Keeping track of the entry %rsp
can be tricky with more complex
functions that allocate lots of local variables and modify the stack in
complex ways. For these cases, the x86-64 Linux calling convention
allows for the use of another register, %rbp
as a special-purpose
register.
%rbp
holds the address of the base of the current stack frame: that
is, the address of the rightmost (highest) address that points to a
value still part of the current stack frame. This corresponds the
rightmost address of an object in the callee's stack, and to the first
address that isn't part of an argument to the callee or one of its local
variables. It is called the base pointer, since the address points
at the "base" of the callee's stack frame (if %rsp
points to the
"top", %rbp
points to the "base" (= bottom). The %rbp
register
maintains this value for the whole execution of the function (i.e., the
function may not overwrite the value in that register), even as %rsp
changes.
This scheme has the advantage that when the function exits, it can
restore its original entry %rsp
by loading it from %rbp
. In
addition, it also facilitates debugging because each function stores the
old value of %rbp
to the stack at its point of entry. The 8 bytes
holding the caller's %rbp
are the very first thing stored inside the
callee's stack frame, and they are right below the return address, which
is in the caller's stack frame, while the saved %rbp
is in the callee
stack frame. This mean that the saved %rbp
s form a chain that allows
each function to locate the base of its caller's stack frame, where it
will find the %rbp
of the "grand-caller's" stack frame, etc. The
backtraces you see in GDB and in Address Sanitizer error messages are
generated precisely using this chain!
Therefore, with a base pointer, the function entry sequence becomes:
-
The first instruction executed by the callee on function entry is
pushq %rbp
. This saves the caller's value for%rbp
into the callee's stack. (Since%rbp
is callee-saved, the callee is responsible for saving it.) -
The second instruction is
movq %rsp, %rbp
. This saves the current stack pointer in%rbp
(so%rbp
= entry%rsp
- 8).This adjusted value of
%rbp
is the callee's "frame pointer" or base pointer. The callee will not change this value until it returns. The frame pointer provides a stable reference point for local variables and caller arguments. (Complex functions may need a stable reference point because they reserve varying amounts of space.)Note, also, that the value stored at
(%rbp)
is the caller's%rbp
, and the value stored at8(%rbp)
is the return address. This information can be used to trace backwards by debuggers (a process called "stack unwinding"). -
The function ends with
movq %rbp, %rsp; popq %rbp; retq
, or, equivalently,leave; retq
. This sequence is the last thing the callee does, and it restores the caller's%rbp
and entry%rsp
before returning.
You can find an example of this in call07.s
. Lab 3 also uses the
%rbp
-based calling convention, so make sure you keep the extra 8 bytes
for storing the caller's %rbp
on the stack in mind!
Buffer overflow attacks #
Now that we understand the calling convention and the stack, let's take a step back and think of some of the consequences of this well-defined memory layout. While a callee is not supposed to access its caller's stack frame (unless it's explicitly passed a pointer to an object within it), there is no principled mechanism in the x86-64 architecture that prevents such access.
In particular, if you can guess the address of a variable on the stack (either a local within the current function or a local/argument in a caller of the current function), your program can just write data to that address and overwrite whatever is there.
This can happen accidentally (due to bugs), but it becomes a much bigger problem if done deliberately by malicious actors: a user might provide input that causes a program to overwrite important data on the stack. This kind of attack is called a buffer overflow attack.
Consider the code in checksummer.cc
. This program computes checksums
of strings provided to it as command line arguments. You don't need to
understand in deep detail what it does, but observe that the
checksum()
function uses a 100-byte stack-allocated buffer (as part of
the buf
union) to hold the input string, which it copies into that
buffer.
A sane execution of checksummer
might look like this:
$ ./checksummer
hey yo CS300
<stdin>: checksum 00796568
But what if the user provides an input string longer than 399 characters
(remember that we also need the zero terminator in the buffer)? The
function just keeps writing, and it will write over whatever is adjacent
to buf
on the stack.
From our prior pictures, we know that buf
will be in checksum
's
stack frame, below the entry %rsp
. Moreover, directly above the
entry %rsp
is the return address! In this case, that is an address in
main()
. So, if checksum
writes beyond the end of buf
, will
overwrite the return address on the stack; if it keeps going further, it
will overwrite data in main
's stack frame.
Why is overwriting the return address dangerous? It means that a clever
attacker can direct the program to execute any function within the
program. In the case of checksummer.cc
, note the exec_shell()
function, which runs a string as a shell command. This has a lot of
nefarious potential – what if we could cause that function to execute
with a user-provided string? We could print a lot of sad face emojis to
the shell, or, more dangerously, run a command like rm -rf /
, which
deletes all data on the user's computer!
If we run ./checksummer.unsafe
(a variant of checksummer
with safety
features added by mondern compilers to combat these attacks disabled),
it behaves as normal with sane strings:
$ ./checksummer.unsafe
hey yo CS300
<stdin>: checksum 00796568
But if we pass a very long string with more than 400 characters, things get a bit more unusual:
$ ./checksummer.unsafe < austen.txt
Segmentation fault (core dumped)
The crash happens because the return address for checksum()
was
overwritten by garbage from our string, which isn't a valid address. But
what if we figure out a valid address and put it in exactly the right
place in our string?
This is what the input in attack.bytes
does. Specifically, using GDB,
I figured out that the address of exec_shell
in my compiled version of
the code is 0x401156 (an address in the code/text segment of the
executable). attack.bytes
contains a carefully crafted "payload" that
puts the value 0x400870 into the right bytes on the stack. The attack
payload is 424 characters long because we need 400 characters to overrun
buf
, 8 bytes for the base pointer, 4 bytes for the malicious return
address, and 12 bytes of extra payload because stack frames on x86-64
Linux are aligned to 16-byte boundaries.
Executing this attack works as follows:
$ ./checksummer.unsafe < attack.bytes
OWNED OWNED OWNED
The < attack.bytes
syntax simple pastes the contents of the
attack.bytes
file into the input to the program.
Summary #
Today, we learned about base pointers and saw an example of a buffer overflow. We also reviewed the layout of the stack.
We then talked about the storage hierarchy with smaller, but faster, storage at the top, and slower, but larger storage at the bottom. Caches are a way of making the bottom layers appear faster than they actually are!