Lecture 10: Assembly Control Flow #
More Assembly Language #
Recap: Registers #
Whenever you see %ZZZ
in assembly code, this refers to a register
named ZZZ
. The x86-64 registers have confusing names because they
evolved over time; each register also has multiple names that refer to
different subsets of its bits. For example %rax
, one of the
general-purpose registers that is, by convention, used to pass return
values from functions, is split into the following five names:
63 31 15 7 0
+-------------------------------+-------------------------------+
| | | | |
+---------------------------------------------------------------+
|---------------------%rax (64 bits/8 bytes)--------------------|
|-----%eax (32 bits/4 bytes)----|
|-%ax (16b/2B)--|
|--%ah--|--%al--| <-- 8 bits/1 byte each
Assembly instructions often have a suffix that indicates what input
data size and register width they're operating on. For instance, a set
of "move" instructions help load signed and unsigned 8-, 16-, and 32-bit
quantities from memory into registers. movzbl
, for example, moves an
8-bit quantity (a byte) into 32-bit register (a longword; e.g.,
%eax
) with zero extension; movslq
moves a 32-bit quantity
(longword) into a 64-bit register (quadword; e.g., %rax
) with
sign extension.
Note that what looks like types (such as long
, short
, etc.) here
merely refers to the register width used in the instruction. All
actual types are removed from the program during compilation; there are
no types in assembly (for examples, see asm06.s
and asm07.s
and
their corresponding C source files in the lecture code).
Instructions #
There are three basic kinds of assembly instructions:
- Computation: These instructions computate on values, typically
values stored in registers. Most have zero or one source operands
and one source/destination operand, with the source operand coming
first. For example, the instruction
addq %rax, %rbx
performs the computation%rbx := %rbx + %rax
. - Data movement: These instructions move data between registers
and memory – so they can move values from one register to another,
from memory into a register, and from a register back to memory.
Almost all move instructions have one source operand and one
destination operand; the source operand comes first. For example,
movq %rax, %rbx
copies the contents of%rax
into%rbx
, so it performs the assignment%rbx = %rax
. - Control flow: Normally the CPU executes instructions in sequence
and in the order they appear in the assembly code (and, once
translated into bytes, the order in memory). Control flow
instructions change the next instruction the processor executes
(something called the "instruction pointer", and stored in special
register
%rip
). There are unconditional branches (the instruction pointer is set to a new value), conditional branches (the instruction pointer is set to a new value if a condition is true), and function call and return instructions.
Some instructions appear to combine computation and data movement. For
example, given the C code int* pi; ... ++(*pi);
the compiler might
generate incl (%rax)
rather than
movl (%rax), %ebx; incl %ebx; movl %ebx, (%rax)
. However, the
processor actually divides these complex instructions into tiny,
simpler, invisible instructions called microcode, because the simpler
instructions can be made to execute faster. The complex incl
instruction actually runs in three phases: data movement, then
computation, then data movement. This matters when we introduce
parallelism.
Different assembly syntaxes
There are actually multiple ways of writing x86-64 assembly. We use the "AT&T syntax", which is distinguished from the "Intel syntax" by several features, but especially by the use of percent signs for registers. Sadly, and just to make things more confusing, the Intel syntax puts destination registers before source registers.
Control Flow #
So far, we've looked at assembly code and developed an intuition for how
to read assembly language instructions. But all programs we looked at
contained only straight control flow, meaning that the assembly
instructions simply execute one after another until the processor hits
the ret
instruction. Real programs contain conditional (if
)
statements, loops (for
, while
), and function calls. Today, we will
understand how those concepts in the C language translate into assembly,
and then build up an understanding of the resulting memory layout that
reveals how a dangerous class of computer security attacks is enabled by
seemingly innocuous C programs.
Your computer's processor is incredibly dumb: given the memory address of an instruction, it goes and executes that instruction, then executes the next instruction in memory, then the next, etc., until either there are no more instructions to run. Control flow instructions change that default behavior by changing where in memory the processor gets its next instruction from.
The role of the
%rip
registerThe
%rip
register on x86-64 is a special-purpose register that always holds the memory address of the next instruction to execute in the program's code segment. The processor increments%rip
automatically after each instruction, and control flow instructions like branches set the value of%rip
to change the next instruction.
Perhaps surprisingly,%rip
also shows up when an assembly program refers to a global variable. See the sidebar under "Addressing modes" below to understand how%rip
-relative addressing works.
Deviations from sequential instruction execution, such as function calls, loops, and conditionals, are called control flow transfers.
A branch instruction jumps to the instruction following a label in
the assembly program. Recall that labels are lines that end with a colon
(e.g., .L3:
) in the assembly generated from the compiler. In an
executable or object file, the labels are replaced by actual memory
addresses, so if you disassemble such a file (objdump -d FILE
), you
will see memory addresses as the branch target instead.
Here is an example of the assembly generated by a program that contains
an if
statement (controlflow01.c
):
.LFB0:
movl a(%rip), %eax
cmpl b(%rip), %eax
jl .L4
.L1:
rep ret
.L4:
movl $0, %eax
jmp .L1
The third and eighth (last) lines both contain branch instructions.
There are two kinds of branches: unconditional and conditional. The
jmp
or j
instruction (line 8) executes an unconditional branch and
control flow always jumps to the branch target (here, .L1
). All other
branch instructions are conditional: they only branch if some condition
holds. That condition is represented by condition flags that are set as
a side effect of every arithmetic operation the processor runs. In the
example program above, the instruction that sets the flags is cmpl
,
which is a "compare" instruction that the processor internally executes
as a subtraction of its first argument from its second argument, setting
the flags and throwing away the result.
Arithmetic instructions change part of the %rflags
register. The most
commonly used flags are:
- ZF (zero flag): set iff the result was zero.
- SF (sign flag): set iff the result, when considered as a signed integer, was negative, i.e., iff most significant bit (the sign bit) of the result was one.
- CF (carry flag): set iff the result overflowed when considered an unsigned value (i.e., the result was greater than 2W-1 for a value of width W bytes).
- OF (overflow flag): set iff the result overflowed when considered a signed value (i.e., the result was greater than 2W-1-1 or less than –2W-1 for a value of width W bytes).
Although a few instructions let you load specific flags into the flag register, code usually accesses flags via a conditional jump or a conditional move instruction.
You will often see the test
and cmp
instructions before a
conditional branch. As mentioned above, these operations perform
arithmetic but throw away the result (rather than storing it in the
destination register), but set the flags. test
performs binary AND,
while cmp
performs subtraction, and both set the flags according to
the result.
Below is a table of all branch instructions on the x86-64 architecture and the flags they look at to decide whether to branch and execute the next instruction at the branch target, or whether to continue execution with the next sequential instruction after the branch.
Instruction | Mnemonic | C example | Flags |
---|---|---|---|
j (jmp) | Jump | break; |
(Unconditional) |
je (jz) | Jump if equal (zero) | if (x == y) |
ZF |
jne (jnz) | Jump if not equal (nonzero) | if (x != y) |
!ZF |
jg (jnle) | Jump if greater | if (x > y) , signed |
!ZF && !(SF ^ OF) |
jge (jnl) | Jump if greater or equal | if (x >= y) , signed |
!(SF ^ OF) |
jl (jnge) | Jump if less | if (x < y) , signed |
SF ^ OF |
jle (jng) | Jump if less or equal | if (x <= y) , signed |
(SF ^ OF) || ZF |
ja (jnbe) | Jump if above | if (x > y) , unsigned |
!CF && !ZF |
jae (jnb) | Jump if above or equal | if (x >= y) , unsigned |
!CF |
jb (jnae) | Jump if below | if (x < y) , unsigned |
CF |
jbe (jna) | Jump if below or equal | if (x <= y) , unsigned |
CF || ZF |
js | Jump if sign bit | if (x < 0) , signed |
SF |
jns | Jump if not sign bit | if (x >= 0) , signed |
!SF |
jc | Jump if carry bit | N/A | CF |
jnc | Jump if not carry bit | N/A | !CF |
jo | Jump if overflow bit | N/A | OF |
jno | Jump if not overflow bit | N/A | !OF |
Loops #
Conditional branch instructions and flags are sufficient to support both
conditional statements (if (...) { ... } else { ... }
blocks in C) and
loops (for (...) { ... }
, while (...) { ... }
, and
do { ... } while (...)
). For a conditional, the branch either jumps if
the condition is true (or false, depending on how the compiler lays out
the assembly) and continues execution otherwise. For a loop, the
assembly will contain a conditional branch at the end of the loop body
that checks the loop condition; if it is still satisfied, the branch
jumps back to a label (or address) at the top of the loop.
When you see a conditional branch in assembly code whose target is a label or address above the branching instruction, it is nearly always a loop.
Consider the example in controlflow02.s
, and the corresponding program
in controlflow02.c
. Let's focus on the assembly code following the
label:
.L3:
movslq (%rdx), %rcx
addq %rcx, %rax
addq $4, %rdx
cmpq %rsi, %rdx
jne .L3
rep ret
[...]
Here, the loop variable is held in register %rdx
, and the value that
the loop variable is compared to on each iteration is in %rsi
. (You
can infer this from the fact that these registers are the only ones that
appear in a comparison.) The instruction above cmpq
increments the
loop variable by 4 every time the loop executes. Finally, loop's body
consists of the two instructions above the addq $4, %rdx
instruction:
the first dereferences a pointer in %rdx
and puts the value at the
memory address it points to into register %rcx
, and the second adds
that value to the contents of %rax
. Since %rax
does not change
before the conditional branch, it will be incremented by the value
pointed to by %rdx
on every iteration: this loop iterates over
integers in memory via pointer arithmetic.
Adressing Modes #
We have seen a few ways in which assembly instruction's operands can be
written already. In particular, the loop example contains (%rdx)
,
which dereferences the address stored in register %rdx
.
The full, general form of a memory operand is
offset(base, index, scale)
, which refers to the address offset +
base + index*scale. In 0x18(%rax, %rbx, 4)
, %rax
is the base,
0x18
the offset, %rbx
the index, and 4
the scale. The offset (if
used) must be a constant and the base and index (if used) must be
registers; the scale must be either 1, 2, 4, or 8. In other words, if we
write this as N(%reg1, %reg2, M)
, the address computed is
%reg1 + N + %reg2 * M
.
The default offset, base, and index are 0, and the default scale is 1,
and instructions omit these parts if they take their default values. You
will most often see instructions of the form offset(%register)
, which
perform simple addition to the address in the register and then
dereference the result. But occasionally, you may come across
instructions that use both base and index registers, or which use the
full general form.
Below is a handy overview table containing all the possible ways of writing operands to assembly instructions.
Type | Example syntax | Value used |
---|---|---|
Register | %rbp |
Contents of %rbp |
Immediate | $0x4 |
0x4 |
Memory | 0x4 |
Value stored at address 0x4 |
symbol_name |
Value stored in global symbol_name (the compiler resolves the symbol name to an address when creating the executable) |
|
symbol_name(%rip) |
%rip -relative addressing for global (see below) |
|
symbol_name+4(%rip) |
Simple computations on symbols are allowed (the compiler resolves the computation when creating the executable) |
|
(%rax) |
Value stored at address in %rax |
|
0x4(%rax) |
Value stored at address %rax + 4 |
|
(%rax,%rbx) |
Value stored at address %rax + %rbx |
|
(%rax,%rbx,4) |
Value stored at address %rax + %rbx*4 |
|
0x18(%rax,%rbx,4) |
Value stored at address %rax + 0x18 + %rbx*4 |
%rip
-relative addressing for global variablesx86-64 code often refers to globals using
%rip
-relative addressing: a global variable nameda
is referenced asa(%rip)
. This style of reference supports position-independent code (PIC), a security feature. It specifically supports position-independent executables (PIEs), which are programs that work independently of where their code is loaded into memory.When the operating system loads a PIE, it picks a random starting point and loads all instructions and globals relative to that starting point. The PIE's instructions never refer to global variables using direct addressing: there is no
movl global_int, %eax
. Globals are referenced relatively instead, using deltas relative to the next%rip
: to load a global variable into a register, the compiler emitsmovl global_int(%rip), %eax
. These relative addresses work independent of the starting point! For instance, consider an instruction located at(starting-point + 0x80)
that loads a variableg
located at(starting-point + 0x1000)
into%rax
. In a non-PIE, the instruction might be written asmovq g, %rax
; but this relies ong
having a fixed address. In a PIE, the instruction might be writtenmovq g(%rip), %rax
, which works out without having to know the starting address of the program's code in memory at compile time (instead,%rip
contains a number some known number of bytes apart from the starting point, so any address relative to%rip
is also relative to the starting point).
At starting point… The mov
instruction is at…The next instruction is at… And g
is at…So the delta ( g
- next%rip
) is…0x400000 0x400080 0x400087 0x401000 0xF79 0x404000 0x404080 0x404087 0x405000 0xF79 0x4003F0 0x400470 0x400477 0x4013F0 0xF79
Calling Convention #
We discussed conditionals and loops, but there is a third type of control flow: function calls. Assembly language has no functions, just sequences of instructions. Function calls therefore translate into control flow involving branches, but we need a bit more than that: functions can take arguments, and the compiler better make sure that the argument are available after it jumps to a function's instructions!
Defining how function calls and returns work, where a function can expect to find its arguments, and where it must place its return value is the business of a calling convention. A calling convention governs how functions on a particular architecture and operating system interact in assembly code. This includes rules on how function arguments are placed, where return values go, what registers functions may use, how they may allocate local variables, and others.
Why do we need calling conventions?
Calling conventions ensure that functions compiled by different compilers can interoperate, and they ensure that operating systems can run code from different programming languages and compilers. For example, you can call into C code from Python, or link C code compiled with
gcc
and code compiled withclang
. This is possible only because the Python libraries that call into C code understand its calling convention, and because thegcc
andclang
compilers' authors agree on the calling convention to use.Some aspects of a calling convention are derived from the instruction set itself and embedded into the architecture (e.g., via special-purpose registers modified as a side-effect of certain instructions), but some are conventional, meaning they wre decided upon by people (for instance, at a convention), and may differ across operating systems and compilers.
Programs call01.c
to call06.c
and their corresponding assembly in
call01.s
to call06.s
help us figure out the calling convention for
x86-64 on the Linux operating system!
The reason why the unoptimized programs seemingly pointlessly write all their arguments to memory in the stack segment is that arguments are local variables of a function, and since local variables have automatic lifetime, they're technically stored in the stack segment. With optimizations, the compiler is smart enough to realize that it can just skip actually storing them, so it just uses the registers containing the arguments directly.
Summary #
We saw that in assembly, there are computation, data movement, and control flow instructions, and that the compiler often produces somewhat unexpected instruction sequences to make things faster. This is part of why we use compilers: they are incredibly smart at distilling our programs down into the fastest possible sequence of instructions.
We then looked at control flow in assembly, where instructions change what other instructions the processor executes next. In many cases, control flow first involves a flag-setting instruction and then a conditional branch based on the values of the flags register. This allows for conditional statements and loops.
Function calls in assembly are governed by the calling convention of the architecture and operating system used: it determines which registers hold specific values such as arguments and return values, which registers a function may modify, and where on the stack certain information (such as the return address) is stored.
Next time, we will talk more about the stack and how it is managed.