Lecture 8: Assembly Language, Calling Convention, and the Stack
» Lecture video (Brown ID required)
» Lecture code
» Post-Lecture Quiz (due 6pm Monday, February 24).
Assembly, continued
Last time, we looked at assembly code and developed an intuition for how to read assembly language instructions.
But all programs we looked at contained only straight control flow, meaning that the assembly instructions
simply execute one after another until the processor hits the ret
instruction. Real programs contain
conditional (if
) statements, loops (for
, while
), and function calls. Today, we
will understand how those concepts in the C language translate into assembly, and then build up an understanding of
the resulting memory layout that reveals how a dangerous class of computer security attacks is enabled by seemingly
innocuous C programs.
Control Flow
Your computer's processor is incredibly dumb: given the memory address of an instruction, it goes and executes that instruction, then executes the next instruction in memory, then the next, etc., until either there are no more instructions to run. Control flow instructions change that default behavior by changing where in memory the processor gets its next instruction from.
The role of the%rip
register
The
%rip
register on x86-64 is a special-purpose register that always holds the memory address of the next instruction to execute in the program's code segment. The processor increments%rip
automatically after each instruction, and control flow instructions like branches set the value of%rip
to change the next instruction.
Perhaps surprisingly,%rip
also shows up when an assembly program refers to a global variable. See the sidebar under "Addressing modes" below to understand how%rip
-relative addressing works.
Deviations from sequential instruction execution, such as function calls, loops, and conditionals, are called control flow transfers.
A branch instruction jumps to the instruction following a label in the assembly program. Recall
that labels are lines that end with a colon (e.g., .L3:
) in the assembly generated from the compiler. In an
executable or object file, the labels are replaced by actual memory addresses, so if you disassemble such a file
(objdump -d FILE
), you will see memory addresses as the branch target instead.
Here is an example of the assembly generated by a program that contains an if
statement
(controlflow01.c
):
.LFB0:
movl a(%rip), %eax
cmpl b(%rip), %eax
jl .L4
.L1:
rep ret
.L4:
movl $0, %eax
jmp .L1
The third and eighth (last) lines both contain branch instructions.
There are two kinds of branches: unconditional and conditional. The jmp
or
j
instruction (line 8) executes an unconditional branch and control flow always jumps to the branch
target (here, .L1
). All other branch instructions are conditional: they only branch if some condition
holds. That condition is represented by condition flags that are set as a side effect of every arithmetic operation
the processor runs. In the example program above, the instruction that sets the flags is cmpl
, which
is a "compare" instruction that the processor internally executes as a subtraction of its first argument
from its second argument, setting the flags and throwing away the result.
Arithmetic instructions change part of the %rflags
register. The most commonly used flags are:
- ZF (zero flag): set iff the result was zero.
- SF (sign flag): set iff the result, when considered as a signed integer, was negative, i.e., iff most significant bit (the sign bit) of the result was one.
- CF (carry flag): set iff the result overflowed when considered an unsigned value (i.e., the result was greater than 2W-1 for a value of width W bytes).
- OF (overflow flag): set iff the result overflowed when considered a signed value (i.e., the result was greater than 2W-1-1 or less than –2W-1 for a value of width W bytes).
You will often see the test
and cmp
instructions before a conditional branch.
As mentioned above, these operations perform arithmetic but throw away the result (rather than storing it in the
destination register), but set the flags. test
performs binary AND, while cmp
performs
subtraction, and both set the flags according to the result.
Below is a table of all branch instructions on the x86-64 architecture and the flags they look at to decide whether to branch and execute the next instruction at the branch target, or whether to continue execution with the next sequential instruction after the branch.
Instruction | Mnemonic | C example | Flags |
---|---|---|---|
j (jmp) | Jump | break; |
(Unconditional) |
je (jz) | Jump if equal (zero) | if (x == y) |
ZF |
jne (jnz) | Jump if not equal (nonzero) | if (x != y) |
!ZF |
jg (jnle) | Jump if greater | if (x > y) , signed |
!ZF && !(SF ^ OF) |
jge (jnl) | Jump if greater or equal | if (x >= y) , signed |
!(SF ^ OF) |
jl (jnge) | Jump if less | if (x < y) , signed |
SF ^ OF |
jle (jng) | Jump if less or equal | if (x <= y) , signed |
(SF ^ OF) || ZF |
ja (jnbe) | Jump if above | if (x > y) , unsigned |
!CF && !ZF |
jae (jnb) | Jump if above or equal | if (x >= y) , unsigned |
!CF |
jb (jnae) | Jump if below | if (x < y) , unsigned |
CF |
jbe (jna) | Jump if below or equal | if (x <= y) , unsigned |
CF || ZF |
js | Jump if sign bit | if (x < 0) , signed |
SF |
jns | Jump if not sign bit | if (x >= 0) , signed |
!SF |
jc | Jump if carry bit | N/A | CF |
jnc | Jump if not carry bit | N/A | !CF |
jo | Jump if overflow bit | N/A | OF |
jno | Jump if not overflow bit | N/A | !OF |
Loops
Conditional branch instructions and flags are sufficient to support both conditional statements (if
(...) { ... } else { ... }
blocks in C) and loops (for (...) { ... }
,
while (...) { ... }
, and do { ... } while (...)
). For a conditional, the branch
either jumps if the condition is true (or false, depending on how the compiler lays out the assembly) and
continues execution otherwise. For a loop, the assembly will contain a conditional branch at the end of the
loop body that checks the loop condition; if it is still satisfied, the branch jumps back to a label (or
address) at the top of the loop.
When you see a conditional branch in assembly code whose target is a label or address above the branching instruction, it is nearly always a loop.
Consider the example in controlflow02.s
, and the corresponding program in
controlflow02.c
. Let's focus on the assembly code following the label:
.L3:
movslq (%rdx), %rcx
addq %rcx, %rax
addq $4, %rdx
cmpq %rsi, %rdx
jne .L3
rep ret
[...]
Here, the loop variable is held in register %rdx
, and the value that the loop variable is compared
to on each iteration is in %rsi
. (You can infer this from the fact that these registers are the only
ones that appear in a comparison.) The instruction above cmpq
increments the loop variable by 4 every
time the loop executes. Finally, loop's body consists of the two instructions above the addq $4, %rdx
instruction: the first dereferences a pointer in %rdx
and puts the value at the memory address it
points to into register %rcx
, and the second adds that value to the contents of %rax
.
Since %rax
does not change before the conditional branch, it will be incremented by the value
pointed to by %rdx
on every iteration: this loop iterates over integers in memory via pointer
arithmetic.
Adressing Modes
We have seen a few ways in which assembly instruction's operands can be written already. In particular, the
loop example contains (%rdx)
, which dereferences the address stored in register %rdx
.
The full, general form of a memory operand is offset(base, index, scale)
, which refers to the
address offset + base + index*scale. In 0x18(%rax, %rbx, 4)
, %rax
is the base, 0x18
the offset, %rbx
the index, and 4
the scale. The
offset (if used) must be a constant and the base and index (if used) must be registers; the scale must be
either 1, 2, 4, or 8. In other words, if we write this as N(%reg1, %reg2, M)
, the
address computed is %reg1 + N + %reg2 * M
.
The default offset, base, and index are 0, and the default scale is 1, and instructions omit these parts if
they take their default values. You will most often see instructions of the form offset(%register)
,
which perform simple addition to the address in the register and then dereference the result. But occasionally,
you may come across instructions that use both base and index registers, or which use the full general form.
Below is a handy overview table containing all the possible ways of writing operands to assembly instructions.
Type | Example syntax | Value used |
---|---|---|
Register | %rbp |
Contents of %rbp |
Immediate | $0x4 |
0x4 |
Memory | 0x4 |
Value stored at address 0x4 |
symbol_name |
Value stored in global symbol_name (the compiler resolves the symbol name to an address when creating the executable) |
|
symbol_name(%rip) |
%rip -relative addressing for global (see below) |
|
symbol_name+4(%rip) |
Simple computations on symbols are allowed (the compiler resolves the computation when creating the executable) |
|
(%rax) |
Value stored at address in %rax |
|
0x4(%rax) |
Value stored at address %rax + 4 |
|
(%rax,%rbx) |
Value stored at address %rax + %rbx |
|
(%rax,%rbx,4) |
Value stored at address %rax + %rbx*4 |
|
0x18(%rax,%rbx,4) |
Value stored at address %rax + 0x18 + %rbx*4 |
%rip
-relative addressing for global variables
x86-64 code often refers to globals using
%rip
-relative addressing: a global variable nameda
is referenced asa(%rip)
. This style of reference supports position-independent code (PIC), a security feature. It specifically supports position-independent executables (PIEs), which are programs that work independently of where their code is loaded into memory.When the operating system loads a PIE, it picks a random starting point and loads all instructions and globals relative to that starting point. The PIE's instructions never refer to global variables using direct addressing: there is no
movl global_int, %eax
. Globals are referenced relatively instead, using deltas relative to the next%rip
: to load a global variable into a register, the compiler emitsmovl global_int(%rip), %eax
. These relative addresses work independent of the starting point! For instance, consider an instruction located at(starting-point + 0x80)
that loads a variableg
located at(starting-point + 0x1000)
into%rax
. In a non-PIE, the instruction might be written asmovq g, %rax
; but this relies ong
having a fixed address. In a PIE, the instruction might be writtenmovq g(%rip), %rax
, which works out without having to know the starting address of the program's code in memory at compile time (instead,%rip
contains a number some known number of bytes apart from the starting point, so any address relative to%rip
is also relative to the starting point).
At starting point… The mov
instruction is at…The next instruction is at… And g
is at…So the delta ( g
- next%rip
) is…0x400000 0x400080 0x400087 0x401000 0xF79 0x404000 0x404080 0x404087 0x405000 0xF79 0x4003F0 0x400470 0x400477 0x4013F0 0xF79
Calling Convention
We discussed conditionals and loops, but there is a third type of control flow: function calls. Assembly language has no functions, just sequences of instructions. Function calls therefore translate into control flow involving branches, but we need a bit more than that: functions can take arguments, and the compiler better make sure that the argument are available after it jumps to a function's instructions!
Defining how function calls and returns work, where a function can expect to find its arguments, and where it must place its return value is the business of a calling convention. A calling convention governs how functions on a particular architecture and operating system interact in assembly code. This includes rules on how function arguments are placed, where return values go, what registers functions may use, how they may allocate local variables, and others.
Why do we need calling conventions?
Calling conventions ensure that functions compiled by different compilers can interoperate, and they ensure that operating systems can run code from different programming languages and compilers. For example, you can call into C code from Python, or link C code compiled with
gcc
and code compiled withclang
. This is possible only because the Python libraries that call into C code understand its calling convention, and because thegcc
andclang
compilers' authors agree on the calling convention to use.Some aspects of a calling convention are derived from the instruction set itself and embedded into the architecture (e.g., via special-purpose registers modified as a side-effect of certain instructions), but some are conventional, meaning they wre decided upon by people (for instance, at a convention), and may differ across operating systems and compilers.
Programs call01.c
to call06.c
and their corresponding assembly in call01.s
to call06.s
help us figure out the calling convention for x86-64 on the Linux operating system!
Some basic rules are:
- The first six function arguments are passed in registers
%rdi
,%rsi
,%rdx
,%rcx
,%r8
, and%r9
(in this order; see the register list from last lecture). - The seventh and subsequent arguments are passed on the stack (see more below).
- The return value is passed in register
%rax
.
struct
), floating point numbers, etc. If you're interested, you can find all the
details in the AMD64
ABI, section 3.2.3.
call04.s
illustrates the rule about the first six arguments best: they are passed straight in
registers. Other examples (e.g., call01
to call03
) are compiled without optimizations and
have somewhat more complex assembly code, which takes the values from registers, writes them onto the stack (more on
that below), and then moves them into registers again.
The reason why the unoptimized programs seemingly pointlessly write all their arguments to memory in the stack
segment is that arguments are local variables of a function, and since local variables have automatic lifetime,
they're technically stored in the stack segment. With optimizations, the compiler is smart enough to realize that
it can just skip actually storing them, so it just uses the registers containing the arguments directly.
The Stack
You will recall the stack segment of memory from earlier lectures: it is where all variables with automatic lifetime are stored. These include local variables declared inside functions, but importantly also function arguments.
Recall that in call01.s
to call03.s
contained a bunch of instructions referring to
%rsp
, such as this implementation of the function f()
(from call01.s
):
movl %edi, -4(%rsp)
movl -4(%rsp), %eax
ret
The first movl
stores the first argument (a 4-byte integer, passed in %edi
) at an address
four bytes below the address stored in register %rsp
; the second movl
instruction takes
that value in memory and loads it into register %eax
.
The %rsp
register is called the stack pointer. It always points to the "top"
of the stack, which is at the lowest (leftmost) address current used in the stack segment. At the start of
the function, any memory to the left of where %rsp
points is therefore unused; any memory to the right
of where it points is used. This explains why the code stores the argument at addresss %rsp - 4
: it's
the first 4-byte slot available on the stack, to the left of the currently used memory.
In other words, the what happened with these instructions is that the blue parts of the picture below were added to the stack memory.
We can give names to the memory on the left and right of the address where %rsp
points in the stack.
The are called stack frames, where each stack frame corresponds to the data associated with one function
call. The memory on the right of the address pointed to be %rsp
at the point f()
gets
called is the stack frame of whatever function calls f()
. This function is named the caller
(the function that calls), while f()
is the callee (the function being called).
The memory on the right of the %rsp
address at the point of f()
being called (we refer
to this as "entry %rsp
") is the caller's stack frame (red below), and the memory to its left
is the callee's stack frame.
The arguments and local variables of f()
live inside f()
's stack frame. Subsequent
arguments (second, third, fourth, etc.) are stored at subsequently lower addresses below %rsp
(see call02.s
and call03.s
for examples with more arguments), followed eventually by
any local variables in the caller.
How does%rsp
change?
The convention is that
%rsp
always points to the lowest (leftmost) stack address that is currently used. This means that when a function declares a new local variable,%rsp
has to move down (left) and if a function returns,%rsp
has to move up (right) and back to where it was when the function was originally called.Moving
%rsp
happens in two ways: explicit modification via arithmetic instructions, and implicit modification as a side effect of special instructions. The former happens when the compiler knows exactly how many bytes a function requires%rsp
to move by, and involves instructions likesubq $0x10, %rsp
, which moves the stack pointer down by 16 bytes. The latter, side-effect modification happens when instructionpush
andpop
run. These instructions write the contents of a register onto the stack memory immediately to the left of the current%rsp
and also modify%rsp
to point to the beginning of this new data. For example,pushq %rax
would write the 8 bytes from register%rax
at address%rsp - 8
and set%rsp
to that address; it is equivalent tomovq %rax, -8(%rsp); subq $8, %rsp
orsubq $8, %rsp; movq %rax, (%rsp)
.
As an optimization, the compiler may choose to avoid writing arguments onto the stack. It does this for up to
six arguments, which per calling convention are held in specific registers. call04.s
shows this: the
C code we compile it from (call04.c
) is identical to the code in call03.c
.
But there is a limited number of registers in the x86-64 architecture, and you can write functions in C that
take any number of arguments! The calling convention says that the first six arguments max be passed in registers,
but that the 7th and above arguments are always passed in memory on the stack. Specifically, these
arguments go into the caller's stack frame, so they are stored above the entry %rsp
at the point where the function is called (see call05.{c,s}
and call06.{c,s}
).
Return Address
As a function executes, it eventually reaches a ret
instruction in its assembly. The effect of
ret
is to return to the caller (a form a control flow, as the next instruction needs to change).
But how does the processor know what instruction to execute next, and what to set %rip
to?
It turns out that the stack plays a role here, too. In a nutshell, each function call stores the return address as the very first (i.e., rightmost) data in the callee's stack frame. (If the function called takes more than six arguments, the return address is to the left of the 7th argument in the caller's stack frame.)
The stored return address makes it possible for each function to know exactly where to continue execution once it returns to its caller. (However, storing the return address on the stack also has some dangerous consequences, as we will see shortly.)
We can now define the full function entry and exit sequence. Both the caller and the callee have responsibilities in this sequence.
To prepare for a function call, the caller performs the following tasks:
The caller stores the first six arguments in the corresponding registers.
If the callee takes more than six arguments, or if some of its arguments are large, the caller must store the surplus arguments on its stack frame (in increasing order). The 7th argument must be stored at
(%rsp)
(that is, the top of the stack) when the caller executes itscallq
instruction.The caller saves any caller-saved registers (see last lecture's list). These are registers whose values the callee might overwrite, but which the caller needs to retain for later use.
The caller executes
callq FUNCTION
. This has an effect likepushq $NEXT_INSTRUCTION; jmp FUNCTION
(or, equivalently,subq $8, %rsp; movq $NEXT_INSTRUCTION, (%rsp); jmp FUNCTION
), whereNEXT_INSTRUCTION
is the address of the instruction immediately followingcallq
.
To return from a function, the callee does the following:
The callee places its return value in
%rax
.The callee restores the stack pointer to its value at entry ("entry
%rsp
"), if necessary.The callee executes the
retq
instruction. This has an effect likepopq %rip
, which removes the return address from the stack and jumps to that address (because the instruction writes it into the special%rip
register).Finally, the caller then cleans up any space it prepared for arguments and restores caller-saved registers if necessary.
Base Pointers and the %rbp
Register
Keeping track of the entry %rsp
can be tricky with more complex functions that allocate lots of
local variables and modify the stack in complex ways. For these cases, the x86-64 Linux calling convention allows
for the use of another register, %rbp
as a special-purpose register.
%rbp
holds the address of the base of the current stack frame: that is, the address of
the rightmost (highest) address that points to a value still part of the current stack frame. This corresponds the
rightmost address of an object in the callee's stack, and to the first address that isn't part of an argument to
the callee or one of its local variables. It is called the base pointer, since the address
points at the "base" of the callee's stack frame (if %rsp
points to the "top",
%rbp
points to the "base" (= bottom). The %rbp
register maintains this value for
the whole execution of the function (i.e., the function may not overwrite the value in that register), even as
%rsp
changes.
This scheme has the advantage that when the function exits, it can restore its original entry %rsp
by loading it from %rbp
. In addition, it also facilitates debugging because each function stores the
old value of %rbp
to the stack at its point of entry. The 8 bytes holding the caller's
%rbp
are the very first thing stored inside the callee's stack frame, and they are right below the
return address in the caller's stack frame. This mean that the saved %rbp
s form a chain that allows
each function to locate the base of its caller's stack frame, where it will find the %rbp
of the
"grand-caller's" stack frame, etc. The backtraces you see in GDB and in Address Sanitizer error messages
are generated precisely using this chain!
Therefore, with a base pointer, the function entry sequence becomes:
The first instruction executed by the callee on function entry is
pushq %rbp
. This saves the caller's value for%rbp
into the callee's stack. (Since%rbp
is callee-saved, the callee is responsible for saving it.)The second instruction is
movq %rsp, %rbp
. This saves the current stack pointer in%rbp
(so%rbp
= entry%rsp
- 8).This adjusted value of
%rbp
is the callee's "frame pointer" or base pointer. The callee will not change this value until it returns. The frame pointer provides a stable reference point for local variables and caller arguments. (Complex functions may need a stable reference point because they reserve varying amounts of space.)Note, also, that the value stored at
(%rbp)
is the caller's%rbp
, and the value stored at8(%rbp)
is the return address. This information can be used to trace backwards by debuggers (a process called "stack unwinding").The function ends with
movq %rbp, %rsp; popq %rbp; retq
, or, equivalently,leave; retq
. This sequence is the last thing the callee does, and it restores the caller's%rbp
and entry%rsp
before returning.
You can find an example of this in call07.s
. Lab 3 also uses the %rbp
-based calling
convention, so make sure you keep the extra 8 bytes for storing the caller's %rbp
on the stack in mind!
Buffer overflow attacks
Now that we understand the calling convention and the stack, let's take a step back and think of some of the consequences of this well-defined memory layout. While a callee is not supposed to access its caller's stack frame (unless it's explicitly passed a pointer to an object within it), there is no principled mechanism in the x86-64 architecture that prevents such access.
In particular, if you can guess the address of a variable on the stack (either a local within the current function or a local/argument in a caller of the current function), your program can just write data to that address and overwrite whatever is there.
This can happen accidentally (due to bugs), but it becomes a much bigger problem if done deliberately by malicious actors: a user might provide input that causes a program to overwrite important data on the stack. This kind of attack is called a buffer overflow attack.
Consider the code in attackme.cc
. This program computes checksums of strings provided to it as command
line arguments. You don't need to understand in deep detail what it does, but observe that the checksum()
function uses a 100-byte stack-allocated buffer (as part of the buf
union) to hold the input string, which
it copies into that buffer.
A sane execution of attackme
might look like this:
$ ./attackme hey yo CS131
hey: checksum 00796568, sha1 7aea02175315cd3541b03ffe78aa1ccc40d2e98a -
yo: checksum 00006f79, sha1 dcdc24e139db869eb059c9355c89c382de15b987 -
CS131: checksum 33315374, sha1 05ab4d9aea4f9f0605dc4703ae8cfc44aab7a5ef -
But what if the user provides an input string longer than 99 characters (remember that we also need the zero terminator
in the buffer)? The function just keeps writing, and it will write over whatever is adjacent to buf
on the
stack.
From our prior pictures, we know that buf
will be in checksum
's stack frame, below the
entry %rsp
. Moreover, directly above the entry %rsp
is the return address! In this
case, that is an address in main()
. So, if checksum
writes beyond the end of buf
,
will overwrite the return address on the stack; if it keeps going further, it will overwrite data in main
's
stack frame.
Why is overwriting the return address dangerous? It means that a clever attacker can direct the program to execute
any function within the program. In the case of attackme.cc
, note the run_shell()
function,
which runs a string as a shell command. This has a lot of nefarious potential – what if we could cause that
function to execute with a user-provided string? We could print a lot of sad face emojis to the shell, or, more
dangerously, run a command like rm -rf /
, which deletes all data on the user's computer!
If we run ./attackme.unsafe
(a variant of attackme
with safety features added by mondern
compilers to combat these attacks disabled), it behaves as normal with sane strings:
$ ./attackme.unsafe hey yo CS131
hey: checksum 00796568, sha1 7aea02175315cd3541b03ffe78aa1ccc40d2e98a -
yo: checksum 00006f79, sha1 dcdc24e139db869eb059c9355c89c382de15b987 -
CS131: checksum 33315374, sha1 05ab4d9aea4f9f0605dc4703ae8cfc44aab7a5ef -
But if we pass a very long string with more than 100 characters, things get a bit more unusual:
$ ./attackme.unsafe sghfkhgkfshgksdhrehugresizqaugerhgjkfdhgkjdhgukhsukgrzufaofuoewugurezgureszgukskgreukfzreskugzurksgzukrestgkurzesi
Segmentation fault (core dumped)
The crash happens because the return address for checksum()
was overwritten by garbage from our string,
which isn't a valid address. But what if we figure out a valid address and put it in exactly the right
place in our string?
This is what the input in attack.txt
does. Specifically, using GDB, I figured out that the address of
run_shell
in my compiled version of the code is 0x400734 (an address in the code/text segment of the
executable). attack.txt
contains a carefully crafted "payload" that puts the value 0x400734
into the right bytes on the stack. The attack payload is 115 characters long because we need 100 characters to overrun
buf
, 3 bytes for the malicious return address, and 12 bytes of extra payload because stack frames on
x86-64 Linux are aligned to 16-byte boundaries.
Executing this attack works as follows:
$ ./attackme.unsafe "$(cat attack.txt)"
OWNED
OWNED
OWNED
OWNED
OWNED
OWNED
sh: 7: ��5��: not found
Segmentation fault (core dumped)
The cat attack.txt
shell command simple pastes the contents of the attack.txt
file into the
string we're passing to the program. (The quotes are required to make sure our attack payload is processed as a single
string even if it contains spaces.)
Summary
Today, we concluded our brief tour of assembly language and the low-level concepts of program execution.
We first looked at control flow in assembly, where instructions change what other instructions the processor executes next. In many cases, control flow first involves a flag-setting instruction and then a conditional branch based on the values of the flags register. This allows for conditional statements and loops.
Function calls in assembly are governed by the calling convention of the architecture and operating system used: it determines which registers hold specific values such as arguments and return values, which registers a function may modify, and where on the stack certain information (such as the return address) is stored.
We also understood in more detail how the stack segment of memory is structured and managed, and discussed how it grows and shrinks. Finally, we looked into how the very well-defined memory layout of the stack can become a danger if a program is compromised through a malicious input: by carefully crafting inputs that overwrite part of the stack memory via a buffer overflow, we can change important data and cause a program to execute arbitrary code.
In Lab 3, you will craft and execute buffer overflow attacks on a program yourself!